Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-28 Thread ahmed baseet
As far as I know, Maven is a build/mgmt tool for java projects quite similar to Ant, right? No I'm not using this , then I think I don't need to worry about those pom files. But I'm still not able to figure out the error with classpath/jar files I mentioned in my previous mails. Shall I try

Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
the Solr distro contains all the jar files. you can take either the latest release (1.3) or a nightly On Tue, Apr 28, 2009 at 11:34 AM, ahmed baseet ahmed.bas...@gmail.com wrote: As far as I know, Maven is a build/mgmt tool for java projects quite similar to Ant, right? No I'm not using this ,

Re: half width katakana

2009-04-28 Thread Koji Sekiguchi
If you use CharFilter, you should use CharStream aware Tokenizer to correct terms offsets. There are two CharStreamAware*Tokenizer in trunk/Solr 1.4. Probably you want to use CharStreamAwareCJKTokenizer(Factory). Koji Ashish P wrote: After this should I be using same cjkAnalyzer or use

Re: highlighting html content

2009-04-28 Thread Christian Vogler
Hi Matt, On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell goodie...@gmail.com wrote: I've been toying with setting custom pre/post delimiters and then removing them in the client, but I thought I'd ask the list before I go to far with that idea :) this is what I do. I define the custom

Getting incorrect value while trying to extract content from xlsx

2009-04-28 Thread Koushik Mitra
HI, I was trying to extract content from an xlsx file for indexing. However, I am getting julian date value for a cell with date format and '1.0' in place of '100%'. I want to retain the value as present in that xlsx file. Solution appreciated. Thanks, Koushik CAUTION -

Re: half width katakana

2009-04-28 Thread Ashish P
Koji san, Using CharStreamAwareCJKTokenizerFactory is giving me following error, SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to org.apache.solr.analysis.CharStream May be you are typecasting Reader to subclass. Thanks, Ashish Koji Sekiguchi-2 wrote: If you use

Multiple Facet Dates

2009-04-28 Thread Marc Sturlese
Hey there, I needed to have a multiple date facet functionality. Like say for example to show the latests results in the last day, last week and last month. I wanted to do it with just one query. The date facet part of solrconfig.xml would look like: str name=facet.datedate_field/str

Re: half width katakana

2009-04-28 Thread Koji Sekiguchi
The exception is expected if you use CharStream aware Tokenizer without CharFilters. Please see example/solr/conf/schema.xml for the setting of CharFilter and CharStreamAware*Tokenizer: !-- charFilter + CharStream aware WhitespaceTokenizer -- !--

RE: OutofMemory on Highlightling

2009-04-28 Thread Gargate, Siddharth
Is it possible to read only maxAnalyzedChars from the stored field instead of reading the complete field in the memory? For instance, in my case, is it possible to read only first 50K characters instead of complete 1 MB stored text? That will help minimizing the memory usage (Though, it will still

Re: Getting incorrect value while trying to extract content from xlsx

2009-04-28 Thread Otis Gospodnetic
Koushik, You didn't say much about how you are doing the extraction. Note that Solr doesn't do any extraction from spreadsheets, even though it has a component (known as Solr Cell) to provide that interface. The actual extraction is done by a tool called Tika, or more precisely, POI, both

Re: Solr Performance bottleneck

2009-04-28 Thread Andrey Klochkov
On Mon, Apr 27, 2009 at 10:27 PM, Jon Bodner jbod...@blackboard.com wrote: Trying to point multiple Solrs on multiple boxes at a single shared directory is almost certainly doomed to failure; the read-only Solrs won't know when the read/write Solr instance has updated the index. I'm

Re: how to reset the index in solr

2009-04-28 Thread Erik Hatcher
On Apr 24, 2009, at 1:54 AM, sagi4 wrote: Can i get the rake task for clearing the index of solr, I mean rake index::rebuild, It would be very helpful and also to avoid the delete id by manually. How do you currently build your index? But making a Rake task to do perform Solr operations

Re: Term highlighting with MoreLikeThisHandler?

2009-04-28 Thread Eric Sabourin
Yes... at least I think so. the highlighting works correctly for me on another request handler... see below the request handler for my morelikethishandler query. Thanks for your help... Eric requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=fl

Re: highlighting html content

2009-04-28 Thread Matt Mitchell
Hi Christian, I decided to do something very similar. How do you handle cases where the highlighting is inside of html/xml tags though? I'm getting stuff like this: ?q=jackson entry type=song author=Michael emJackson/emBad by Michael emJackson/em/entry I wrote a regular expression to take care

Re: Getting incorrect value while trying to extract content from xlsx

2009-04-28 Thread Erik Hatcher
How are you indexing it? A sample of the CSV file would be helpful. Note that while the CSV update handler is very convenient and very fast, it also doesn't have much in the way of data massaging/ transformation - so it might require you pre-format the data for Solr ingestion, or have a

Re: Solr Performance bottleneck

2009-04-28 Thread Otis Gospodnetic
Hi, You should probably just look at the index version number to figure out if the name changed. If you are looking at segments.gen, you are looking at a file that may not exist in Lucene in the future. Use IndexReader API instead. By refreshes do you mean reopened a new Searcher? Does

Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-28 Thread Glen Newton
Amit, You might want to take a look at LuSql[1] and see if it may be appropriate for the issues you have. thanks, Glen [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql 2009/4/27 Amit Nithian anith...@gmail.com: All, I have a few questions regarding the data import

Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-28 Thread ahmed baseet
Thank you very much. Now its working fine, fixed those minor classpath issues. Thanks, Ahmed. 2009/4/28 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com the Solr distro contains all the jar files. you can take either the latest release (1.3) or a nightly On Tue, Apr 28, 2009 at 11:34 AM,

Re: Unique Identifiers

2009-04-28 Thread Erik Hatcher
On Apr 28, 2009, at 9:49 AM, ahammad wrote: Is it possible for Solr to assign a unique number to every document? Solr has a UUIDField that can be used for this. But... For example, let's say that I am indexing from several databases with different data structures. The first one has a

Re: Snapinstaller on slave solr server | Can not connect to solr server issue

2009-04-28 Thread payalsharma
To add to that : This issue was coming because of the commit script called internally by snapinstaller . Commit script creates the solr url to do the comit as shown below: curl_url=http://${solr_hostname}:${solr_port}/${webapp_name}/update commitscript logs: 2009/04/28 18:48:21

Re: Snapinstaller on slave solr server | Can not connect to solr server issue

2009-04-28 Thread payalsharma
To add to that : This issue was coming because of the commit script called internally by snapinstaller . Commit script creates the solr url to do the comit as shown below: curl_url=http://${solr_hostname}:${solr_port}/${webapp_name}/update commitscript logs: 2009/04/28 18:48:21

Re: newbie question about indexing RSS feeds with SOLR

2009-04-28 Thread Koji Sekiguchi
Just an FYI: I've never tried, but there seems to be RSS feed sample in DIH: http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 Koji Tom H wrote: Hi, I've just downloaded solr and got it working, it seems pretty cool. I have a project which needs to

Re: Can we provide context dependent faceted navigation from SOLR search results

2009-04-28 Thread Koji Sekiguchi
Thanh Doan wrote: Assuming a solr search returns 10 listing items as below 1) 4 digital cameras 2) 4 LCD televisions 3) 2 clothing items If we navigate to /electronics we want solr to show us facets specific to 8 electronics items (e.g brand, price). If we navigate to

Re: spellcheck.collate causes StringIndexOutOfBoundsException during startup.

2009-04-28 Thread Koji Sekiguchi
I see you are using firstSearcher/newSearcher event listener on your startup and cause the problem. If you don't need them, commented out them in solrconfig.xml. Koji Eric Sabourin wrote: I’m using SOLR 1.3.0 (from download, not a nightly build) apache-tomcat-5.5.27 on Windows XP. When

Re: Can we provide context dependent faceted navigation from SOLR search results

2009-04-28 Thread Matt Mitchell
Wow, this looks great. Thanks for this Koji! Matt On Tue, Apr 28, 2009 at 12:13 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Thanh Doan wrote: Assuming a solr search returns 10 listing items as below 1) 4 digital cameras 2) 4 LCD televisions 3) 2 clothing items If we navigate to

Re: fail to create or find snapshoot

2009-04-28 Thread Jian Han Guo
I think this is a bug. I looked at the classes SnapShooter, and it's constructor looks like this: public SnapShooter(SolrCore core) { solrCore = core; } This leaves the variable snapDir to be null, and the variable is never initialized elsewhere, and later in the function

Unable to import data from database

2009-04-28 Thread Ci-man
I am using MS SQL server and want to index a table. I setup my data-config like this: dataConfig dataSource type=JdbcDataSource batchSize=25000 autoCommit=true driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

Re: Solr Performance bottleneck

2009-04-28 Thread Andrey Klochkov
On Tue, Apr 28, 2009 at 3:18 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, You should probably just look at the index version number to figure out if the name changed. If you are looking at segments.gen, you are looking at a file that may not exist in Lucene in the future.

Re: Unable to import data from database

2009-04-28 Thread ahammad
Did you define all the fields that you used in schema.xml? Ci-man wrote: I am using MS SQL server and want to index a table. I setup my data-config like this: dataConfig dataSource type=JdbcDataSource batchSize=25000 autoCommit=true

Multiple Queries

2009-04-28 Thread Ankush Goyal
Hi, I have been trying to solve a performance issue: I have an index of hotels with their ids and another index of reviews. Now, when someone queries for a location, the current process gets all the hotels for that location. And, then corresponding to each hotel-id from all the hotel documents,

Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-28 Thread Amit Nithian
I do remember LuSQL and a discussion regarding the performance implications of using it compared to the DIH. My only reason to stick with DIH is that we may have other data sources for document loading in the near term that may make LuSQL too specific for our needs. Regarding the bug to write to

RE: facet with group by (or field collapsing)

2009-04-28 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
I began a similar thread under the subject Distinct terms in facet field. One thing I noticed though is that your fields seem to have a lot of controlled values, or lack free text. Are you sure SOLR is what you should be using? Perhaps a traditional RDB would be better and then you would have

Re: Multiple Queries

2009-04-28 Thread Erick Erickson
Have you considered indexing the reviews along with the hotels right in the hotel index? That way you would fetch the reviews right along with the hotels... Really, this is another way of saying flatten your data G... Your idea of holding all the hotel reviews in memory is also viable, depending

Re: Can we provide context dependent faceted navigation from SOLR search results

2009-04-28 Thread Thanh Doan
After posting this question I found this discussion http://www.nabble.com/Hierarchical-Facets--to7135353.html. So what I did was adapting the scheme with 3 fields; cat, subcat,subsubcat and hardcoded the hierarchical logic in the UI layer to present hierarchical taxonomy for the users. The

Re: MacOS Failed to initialize DataSource:db+ DataimportHandler ???

2009-04-28 Thread gateway0
That didn´t work either. All my libraries are at /Applications/tomcat/webapps/solr/WEB-INF/lib So is apache-solr-dataimporthandler-1.3.0.jar However I did create a new /lib directory under my solr home at /Applications/solr and copied the jar to that location as well. But no difference. Here

RE: facet with group by (or field collapsing)

2009-04-28 Thread Qingdi
Hi Tim, Thanks for your reply. The index structure in my original post is just an example. We do have many free text fields with different analyzers. I checked your post Distinct terms in facet field, but I think the issues we try to address are different. Yours is to get distinct terms in the

Re: WordDelimiterFilterFactory removes words when options set to 0

2009-04-28 Thread Chris Hostetter
: In trying to understand the various options for : WordDelimiterFilterFactory, I tried setting all options to 0. This seems : to prevent a number of words from being output at all. In particular : can't and 99dxl don't get output, nor do any wods containing hypens. : Is this correct

Re: half width katakana

2009-04-28 Thread Chris Hostetter
: The exception is expected if you use CharStream aware Tokenizer without : CharFilters. Koji: i thought all of the casts had been eliminated and replaced with a call to CharReader.get(Reader) ? : Please see example/solr/conf/schema.xml for the setting of CharFilter and :

RE: fl parameter

2009-04-28 Thread Chris Hostetter
: Anyone able to help with the question below? dealing with fl is a delicate dance in Solr right now .. complicated by both FieldSelector logic and distributed search (where both DocList and SolrDocumentList objects need to be dealt with). I looked at this recently and even I can't remember

Re: half width katakana

2009-04-28 Thread Koji Sekiguchi
Chris Hostetter wrote: : The exception is expected if you use CharStream aware Tokenizer without : CharFilters. Koji: i thought all of the casts had been eliminated and replaced with a call to CharReader.get(Reader) ? Yeah, right. After r758137, ClassCastException should be eliminated.

field type for serialized code?

2009-04-28 Thread Matt Mitchell
Hi, I'm attempting to serialize a simple ruby object into a solr.StrField - but it seems that what I'm getting back is munged up a bit, in that I can't de-serialize it. Is there a field type for doing this type of thing? Thanks, Matt

Re: Multiple Queries

2009-04-28 Thread Amit Nithian
Ankush, It seems that unless reviews are changing constantly, why not do what Erick was saying in flattening your data by storing reviews with the hotel index but re-index your hotels storing the top two reviews. I guess I am suggesting computing the top two reviews for each hotel offline and

Re: how to reset the index in solr

2009-04-28 Thread Geetha
Thank you Erik.. Should I write the below code in rake task /lib/tasks/solr.rake? I am newbie to ruby. Erik Hatcher wrote: On Apr 24, 2009, at 1:54 AM, sagi4 wrote: Can i get the rake task for clearing the index of solr, I mean rake index::rebuild, It would be very helpful and also to

Re: Multiple Queries

2009-04-28 Thread Avlesh Singh
Ankush, Your approach works. Fire a in query on the review index for all hotel ids you care about. Create a map of hotel to its reviews. Cheers Avlesh On Wed, Apr 29, 2009 at 8:09 AM, Amit Nithian anith...@gmail.com wrote: Ankush, It seems that unless reviews are changing constantly, why not

Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
writing to a remote Solr through SolrJ is in the cards. I may even take it up after 1.4 release. For now your best bet is to override the class SolrWriter and override the corresponding methods for add/delete. On Wed, Apr 29, 2009 at 2:06 AM, Amit Nithian anith...@gmail.com wrote: I do remember

Re: how to reset the index in solr

2009-04-28 Thread Geetha
I need a function (through solr ruby) for ruby that will allow us to clear everything regards, Sg.. Geetha wrote: Thank you Erik.. Should I write the below code in rake task /lib/tasks/solr.rake? I am newbie to ruby. Erik Hatcher wrote: On Apr 24, 2009, at 1:54 AM, sagi4 wrote: Can i

Re: field type for serialized code?

2009-04-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
is the serialized data in UTF-8 string? On Wed, Apr 29, 2009 at 6:42 AM, Matt Mitchell goodie...@gmail.com wrote: Hi, I'm attempting to serialize a simple ruby object into a solr.StrField - but it seems that what I'm getting back is munged up a bit, in that I can't de-serialize it. Is there