Hello Jack, hi all, 2012/5/28 Jack Krupansky <j...@basetechnology.com>: > Other obvious metadata from the Twitter API to index would be hashtags, user > mentions (both the user id/screen name and user name), date/time, urls > mentioned (expanded if a URL shortener is used), and possibly coordinates > for spatial search.
You rise good points here. Just to understand better how it works in Solr: say that we have a tweet that makes use of a hashtag and mentions another user. I don't know how this would actually appear coming from the Twitter Streaming API, and I am assuming that, at least the tweet itself (excluding date/time and stuff) , is just raw text, like "Hey @alex1987, thank you for telling me how cool is #rubyonrails" So: in order to make Solr understand that here we have a mention to a user (@alex1987) and a hashtag (#rubyonrails) I have to format it myself and include those info in my own XML schema, and preprocess that tweet in order to get to <add> <doc> <field name="tweetId">ABCD1234</field> <field name="tweet_text">Hey @alex1987, thank you for telling me how cool is #rubyonrails</field> <field name="user">happyRubyist</field> <field name="mentions">alex1987</field> <field name="hashtags">rubyonrails</field> </doc> </add> Correct? I have to preprocess and explicit those fields, if I want them to be indexed as metadata, right? I am asking since I am new here to Solr. > Although, I imagine quite a few people have already done this quite a few > times before, so maybe somebody could contribute their Twitter Solr schema. > Anybody? Oh that would be nice :-) Cheers, Giovanni