indexing unstructured text (tweets)

Giovanni Gherdovich Mon, 28 May 2012 04:38:51 -0700

Hi all.

I am in the process of setting up Solr for my application,
which is full text search on a bunch of tweets from twitter.


I am afraid I am missing something.
>From the books I am reading, "Apache Solr 3 Enterprise Search Server",
it looks like Solr works with structured input, like XML or CVS,
while I have the most wild and unstructured input ever (tweets).
A section named "Indexing documents with Solr Cell" seems to address my problem,
but also shows that before getting to Solr, I might need to use
another Apache tool called Tika.

Can anybody provide a brief explaination about the general picture?
Can I index my tweets with Solr?
Or do I need to put also Tika in my pipeline?

Best regards,
Giovanni Gherdovich

indexing unstructured text (tweets)

Reply via email to