Re: How can I index this?
That would certainly work. Just as a general thing, how would one go about indexing Sharepoint content anyway? I heard about the Sharepoint connector for Lucene but I know nothing about it. Is there a standard best practice method? Also, what are your thoughts on extending the DIH? Is that recommended? Thanks for the input :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-index-this-tp3666106p3670392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can I index this?
Perhaps I was a little confusing... Normally when I have DB access, I do a regular indexing process using DIH. For these two sources, I do not have direct DB access. I can only view the two sources like any end-user would. I do have a java class that can get the information that I need. That class gets that information (through HTTP requests) and does not have DB access. That class is currently being used for other purposes but I can take it and use it for Solr as well. Does that make sense? Knowing all that, namely the fact that I cannot directly access the DB, and I can make HTTP requests to get the info, how can I index that info? Please let me know if this clarifies what I am trying to do. Regards -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-index-this-tp3666106p3666590.html Sent from the Solr - User mailing list archive at Nabble.com.
How can I index this?
Hello, I am looking into indexing two data sources. One of those is a standard website and the other is a Sharepoint site. The problem is that I have no direct database access. Normally I would just use the DIH and get what I need from the DB. I do have a java DAO (data access object) class that I am using to directly to fetch information for a different purpose. In cases like this, what would be the best way to index the data? Should I somehow integrate Nutch as the crawler? Should I write a custom DIH? Can I use the DAO that I have in conjunction with the DIH? I am really looking for some recommendations here. I do have a few hacks that can be done (copy the data in a DB and index with DIH), but I am interested in the proper way. Any insight will be greatly appreciated. Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-index-this-tp3666106p3666106.html Sent from the Solr - User mailing list archive at Nabble.com.
Upgrading from 1.4 to the latest version
I was doing some reading on the new features and whatnot, and I am interested in upgrading. I have a few questions though: 1) The index seemed to have changed, can I reuse the current index or should I reindex the data? I read some things about optimizing the index and whatnot, but I am not clear on that. 2) Will SolrJ still work? 3) Has there been any changes in the config files or the schema files such that my existing files won't work, or can I simply reuse them? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrading-from-1-4-to-the-latest-version-tp3651234p3651234.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about morelikethis and multiple fields
I don't quite understand what you mean by that. Did you mean TermVector Components? Also, I did some more digging and I found some messages on this mailing list about filtering. From what I understand, using the standard query handler (solr/select/?q=...) with a qt parameter allows you to filter on the initial response using the fq parameter. While this is not a perfect solution for my application, it will greatly reduce any errors that I may get in the data. However, when I tried fq, all it's doing is filtering on the result set from the mlt handler, not the initial response. I need to filter on both the initial response and the result set. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-morelikethis-and-multiple-fields-tp1836778p1837351.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about morelikethis and multiple fields
Hello, I'm trying to implement a "Related Articles" feature within my search application using the mlt handler. To give you a little background information, my Solr index contains a single core that is created by merging 10+ other cores. Within this core is my main data item known as an "article"; however, there are other data items like "technical documents", "tickets", etc. When a user opens an article on my web application, I want to show "Related Articles" based on 2 fields (title and body). I am using SolrJ as a back-end for this . The way I'm thinking of doing it is to search on the title of the existing article, and hope that the first hit is that actual article. This works in most of the cases, but occasionally it grabs either the wrong article or a different type of data item altogether (the first hit my be a technical document, which is totally unrelated to articles). The following is my query: ?qt=%2Fmlt&mlt.match.include=true&mlt.mindf=1&mlt.mintf=1&mlt.fl=title,body&q=&fq=dataItem:article&debugQuery=true There is one main thing that I noticed is that this only seems to match on the "body" field and not the "title" field. I think it's doing what it's supposed to and I'm not fully grasping the idea of mlt. So when it does the initial search to find the document against which it will find related articles, what search handlers would it use? Normally, my queries are carried out using dismax with some boosting functionality applied to them. When I use the standard query handler however, with the qt parameter defining mlt, what happens for the initial search? Also, if anybody can suggest an alternative implementation to this I would greatly appreciate it. Like I said, it's entirely possible that I don't fully understand mlt and it's causing me to implement stuff in a weird way. Thanks/ -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-morelikethis-and-multiple-fields-tp1836778p1836778.html Sent from the Solr - User mailing list archive at Nabble.com.
Shards VS Merged Core?
Hello all, I'm just wondering what the benefits/consequences are of using shards or merging all the cores into a single core. Personally I have tried both, but my document set is not large enough that I can actually test performance and whatnot. What is a better approach of implementing a search mechanism on multiple cores (10-15 cores)? -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-VS-Merged-Core-tp1738771p1738771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Matching exact words
Hello Erick, Thanks for the reply. I am a little confused by this whole stemming thing. What exactly does it refer to? Basically, I already have a field which is essentially a collection of many other fields (done using copyField). This field is a text field. So what you're saying is to have a duplicate of this field with different properties such that it does not stem? When querying, I assume that I will have to explicitly specify which field to search against...is this correct? I'm a little rusty on the solr stuff to be honest so please bear with me. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Matching-exact-words-tp1353350p1357027.html Sent from the Solr - User mailing list archive at Nabble.com.
Matching exact words
Hello, I have a case where if I search for the word "windows", I get results containing both "windows" and "window" (and probably other things like "windowing" etc.). Is there a way to find exact matches only? The field in which I am searching is a text field, which as I understand causes this behaviour. I cannot use a string field because it is very restricted, but what else can be done? I understand there are other types of text fields that are more strict than the standard field. Ideally I would like to keep my index the way it is, with the ability to force exact matches. For example, if I can search "windows -window" or something like that, that would be great. Or if I can wrap my query in a set of quotes to tell it to match exactly. I've seen that done before but I cannot get it to work. As a reference, here is my query: q={!boost b=$db v=$qq defType=$sh}&qq=windows&db=recip(ms(NOW,lastModifiedLong),3.16e-11,1,1)&sh=dismax To be quite frank, I am not very familiar with this syntax. I am just using whatever my old coworker left behind. Any tips on how to find exact matches or improve the above query will be greatly appreciated. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Matching-exact-words-tp1353350p1353350.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance issues when querying on large documents
Hello, I have an index with lots of different types of documents. One of those types basically contains extracts of PDF docs. Some of those PDFs can have 1000+ pages, so there would be a lot of stuff to search through. I am experiencing really terrible performance when querying. My whole index has about 270k documents, but less than 1000 of those are the PDF extracts. The slow querying occurs when I search only on those PDF extracts (by specifying filters), and return 100 results. The 100 results definitely adds to the issue, but even cutting that down can be slow. Is there a way to improve querying with such large results? To give an idea, querying for a single word can take a little over a minute, which isn't really viable for an application that revolves around searching. For now, I have limited the results to 20, which makes the query execute in roughly 10-15 seconds. However, I would like to have the option of returning 100 results. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multi-valued associated fields
In our deployment, we thought that complications might arise when attempting to hit the Solr server with addresses of too many cores. For instance, we have 15+ cores running at the moment. At the worst case, we will have to use all 15+ addresses of all the cores to search all our data. What we eventually did was to combine all the cores into a single core, which will basically give us a more clean solution. You will get the simplicity of querying one core, but the flexibility of modifying cores separately. Basically, we have all the cores indexing separately. We set up a script that would use the index merge functionality of Solr to combine all the indexes into a single index accessible through one core. Yes, there will be some overhead on the server, but I believe that it's a good compromise. In our case, we have multiple servers at our disposal, so this was not a problem to implement. It all depends on your data set and the volume of documents that you will be indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multi-valued associated fields
I had the same problem as you last year, i.e. indexing stuff from different sources with different characteristics. The way I approached it is by setting up a multi-core environment, with each core representing one type of data. Within each core, I had a "data type" sort of field that would define what kind of data is stored (i.e. in your case, it would be "auto" or "real estate" etc...). The advantages of this setup is that it allows you to make changes to individual cores without affecting anything else. Also, faceting based on category is achieved by the data type field. You can do searching on multiple cores like you would on a single core, meaning that all the search parameters can be applied. Solr will automatically merge all the data into one result set. Another advantage is if you index frequently, this way will allow you to index at different times and reduce the overall load. Just a thought an an approach... -- View this message in context: http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue with delta import (not finding data in a column)
Hello, I am not reusing the context object. The remaining part of the code takes in a "Blob" object, converts it to a FileInputStream, and reads the contents using PDFBox. It does not deal with anything related to Solr. The Transformer doesn't even execute the remaining part of the code. It doesn't get that far. Let me know if you need any more information. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-delta-import-not-finding-data-in-a-column-tp788993p812818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue with delta import (not finding data in a column)
Hello, I was doing some more testing but I could not find a definitive reason for this behavior. The following is my transformer: public Map transformRow(Map row, Context context) { List> fields = context.getAllEntityFields(); for (Map field : fields) { // Check if this field has blob="true" specified in the data-config.xml String blob = field.get("blob"); if ("true".equals(blob)) { String columnName = field.get("column"); // Get the field's value from the current row Blob data = (Blob) row.get(columnName); // Transform the blob and store back into the same column if (data != null) { row.put(columnName, process(data)); } else { log.error("Blob is null."); } } } return row; } Note: The function "process" is the function that actually takes care of the whole transformation. What I noticed is that the "row" variable only has the ID, probably due to this: deltaQuery="select ID from TABLE1 where (LASTMODIFIED > to_date('${dataimporter.last_index_time}', '-mm-dd HH24:MI:SS'))" However, even if I change it to a "select * " statement, I get everything except the column that contains the blob (it is returned as null). Something tells me that the data-config may be incorrect. I cannot explain how this works for full-imports and not delta-imports. I hope that I explained this issue properly. I am really stuck on this. Any help would be highly appreciated. -- ahammad wrote: > > I have a Solr core that retrieves data from an Oracle DB. The DB table has > a few columns, one of which is a Blob that represents a PDF document. In > order to retrieve the actual content of the PDF file, I wrote a Blob > transformer that converts the Blob into the PDF file, and subsequently > reads it using PDFBox. The blob is contained in a DB column called > DOCUMENT, and the data goes into a Solr field called fileContent, which is > required. > > This works fine when doing full imports, but it fails for delta imports. I > debugged my transformer, and it appears that when it attempts to fetch the > blob stored in the column, it gets nothing back (i.e. null). Because the > data is essentially null, it cannot retrieve anything, and cannot store > anything into Solr. As a result, the document does not get imported. I am > not sure what the problem is, because this only occurs with delta imports. > > Here is my data-config file: > > > user="user" password="pass"/> > > deltaImportQuery="select * from TABLE1 where ID > ='${dataimporter.delta.ID}'" > deltaQuery="select ID from TABLE1 where (LASTMODIFIED > > to_date('${dataimporter.last_index_time}', '-mm-dd HH24:MI:SS'))" > > transformer="BlobTransformer"> > > > >blob="true"/> >name="lastModified" /> > > > > > > > Thanks. > -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-delta-import-not-finding-data-in-a-column-tp788993p812511.html Sent from the Solr - User mailing list archive at Nabble.com.
Issue with delta import (not finding data in a column)
I have a Solr core that retrieves data from an Oracle DB. The DB table has a few columns, one of which is a Blob that represents a PDF document. In order to retrieve the actual content of the PDF file, I wrote a Blob transformer that converts the Blob into the PDF file, and subsequently reads it using PDFBox. The blob is contained in a DB column called DOCUMENT, and the data goes into a Solr field called fileContent, which is required. This works fine when doing full imports, but it fails for delta imports. I debugged my transformer, and it appears that when it attempts to fetch the blob stored in the column, it gets nothing back (i.e. null). Because the data is essentially null, it cannot retrieve anything, and cannot store anything into Solr. As a result, the document does not get imported. I am not sure what the problem is, because this only occurs with delta imports. Here is my data-config file: Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-delta-import-not-finding-data-in-a-column-tp788993p788993.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding a prefix to fields
Hello, Is it possible to add a prefix to the data in a Solr field? For example, right now, I have a field called "id" that gets data from a DB through the DataImportHandler. The DB returns a 4-character string like "ag5f". Would it be possible to add a prefix to the data that is received? In this specific case, the data relates to articles. So effectively, if the DB has "ag5f" as an ID, I want it to be stored as "Article_ag5f". Is there a way to define a prefix of "Article_" for a certain field? I am aware that this can be done by writing a transformer. I already have 4 transformers handling a multitude of other things, and I would prefer an alternative... Thanks -- View this message in context: http://www.nabble.com/Adding-a-prefix-to-fields-tp25062226p25062226.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange error with shards
Each core has a different database as a datasource, which means that they have different DB structures and fields. That is why the schemas are different. I figured out the cause of this problem. You were right, it was the uniqueKey field. All of my cores have that field set to "id" but for this new core, it is set to "threadID". Changing that to id fixed the problem. Shalin Shekhar Mangar wrote: > > On Tue, Aug 18, 2009 at 9:01 PM, ahammad wrote: > >> HTTP Status 500 - null java.lang.NullPointerException at >> >> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437) >> at >> >> The way I created this shard was to copy an existing one, erasing all the >> data files/folders, and modifying my schema/data-config files. So the >> core >> settings are pretty much the same. >> > > What did you modify in the schema? All the shards should have the same > schema. That exception can come if the uniqueKey is missing/null. > > If all the shards should have the same schema, then what is the point of > sharding in the first place? I thought that it was used to combine > different cores with different index structures...Right now, every core I > have is unique, and every schema is different... > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Strange-error-with-shards-tp25027486p25043859.html Sent from the Solr - User mailing list archive at Nabble.com.
Strange error with shards
Hello, I have been using multicore/shards for the past 5 months or so with no problems at all. I just added another core to my Solr server, but for some reason I can never get the shards working when that specific core is anywhere in the URL (either in the shards list or the base URL). HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:281) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527) at java.lang.Thread.run(Thread.java:619) The way I created this shard was to copy an existing one, erasing all the data files/folders, and modifying my schema/data-config files. So the core settings are pretty much the same. If I try the shard parameter with any of the other 7 cores that I have, it works fine. It's only when this specific one is in the URL... Cheers -- View this message in context: http://www.nabble.com/Strange-error-with-shards-tp25027486p25027486.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question regarding merging Solr indexes
Yes, that is exactly what I did. If I copy that link, I get a 404 error saying that I need a core name in the URL. If I add the core name in the URL, I get forwarded to the core's admin panel, and nothing happens. Am I missing something else? Shalin Shekhar Mangar wrote: > > On Fri, Aug 7, 2009 at 10:45 PM, ahammad wrote: > >> >> Hello, >> >> I have a MultiCore setup with 3 cores. I am trying to merge the indexes >> of >> core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear >> on >> what needs to happen. >> >> This is what I used: >> >> >> http://localhost:9085/solr/core3/admin/?action=mergeindexes&core=core3&indexDir=/solrHome/core1/data/index&indexDir=/solrHome/core2/data/index&commit=true >> >> When I hit this I just go to the admin page for core3. Maybe the way I >> reference the indexes is incorrect? What path goes there anyway? >> > > Look at > http://wiki.apache.org/solr/MergingSolrIndexes#head-0befd0949a54b6399ff926062279afec62deb9ce > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Question-regarding-merging-Solr-indexes-tp24868670p24887460.html Sent from the Solr - User mailing list archive at Nabble.com.
Question regarding merging Solr indexes
Hello, I have a MultiCore setup with 3 cores. I am trying to merge the indexes of core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on what needs to happen. This is what I used: http://localhost:9085/solr/core3/admin/?action=mergeindexes&core=core3&indexDir=/solrHome/core1/data/index&indexDir=/solrHome/core2/data/index&commit=true When I hit this I just go to the admin page for core3. Maybe the way I reference the indexes is incorrect? What path goes there anyway? Thanks -- View this message in context: http://www.nabble.com/Question-regarding-merging-Solr-indexes-tp24868670p24868670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with retrieving field from database using DIH
I looked at the DIH debug page to to be honest I'm not sure how to use it well and get something out of it. I am using a solr 1.4 nightly from March. Cheers Noble Paul നോബിള് नोब्ळ्-2 wrote: > > you can try going to the DIH debug page. BTW which version of DIH are you > using? > > On Fri, Jul 31, 2009 at 6:31 PM, ahammad wrote: >> >> Hello, >> >> I tried it using the debug and verbose parameters in the address bar. >> This >> is what appears in the logs: >> >> INFO: Starting Full Import >> Jul 31, 2009 8:54:40 AM org.apache.solr.handler.dataimport.SolrWriter >> readIndexerProperties >> INFO: Read dataimport.properties >> Jul 31, 2009 8:54:40 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> SEVERE: Full Import failed >> java.lang.NullPointerException >> at >> org.apache.solr.handler.dataimport.DebugLogger.peekStack(DebugLogger.java:78) >> at >> org.apache.solr.handler.dataimport.DebugLogger.log(DebugLogger.java:98) >> at >> org.apache.solr.handler.dataimport.SolrWriter.log(SolrWriter.java:248) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:305) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374) >> at >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:187) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) >> at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) >> at >> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) >> at >> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) >> at >> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) >> at >> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) >> at java.lang.Thread.run(Unknown Source) >> Jul 31, 2009 8:54:40 AM org.apache.solr.update.DirectUpdateHandler2 >> rollback >> INFO: start rollback >> Jul 31, 2009 8:54:40 AM org.apache.solr.update.DirectUpdateHandler2 >> rollback >> INFO: end_rollback >> >> >> It's different than before because this fails right away. Before adding >> debug/verbose, it would go through all the rows. >> >> It is possible that the last modified column may be missing some data in >> some rows. The import, however, fails for every single row, which is >> impossible. I am positive that there is data in that column. >> >> Any other suggestions? >> >> Cheers >> >> >> ahammad wrote: >>> >>> Hello all, >>> >>> I've been having this issue for a while now. I am indexing a Sybase >>> database. Everything is fantastic, except that there is 1 column that I >>> can never get back. I don't have direct database access via Sybase >>> client, >>> but I was able to extract the data using some Java code. >>> >&g
Re: Problem with retrieving field from database using DIH
Hello, I tried it using the debug and verbose parameters in the address bar. This is what appears in the logs: INFO: Starting Full Import Jul 31, 2009 8:54:40 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Jul 31, 2009 8:54:40 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.NullPointerException at org.apache.solr.handler.dataimport.DebugLogger.peekStack(DebugLogger.java:78) at org.apache.solr.handler.dataimport.DebugLogger.log(DebugLogger.java:98) at org.apache.solr.handler.dataimport.SolrWriter.log(SolrWriter.java:248) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:305) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Jul 31, 2009 8:54:40 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jul 31, 2009 8:54:40 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback It's different than before because this fails right away. Before adding debug/verbose, it would go through all the rows. It is possible that the last modified column may be missing some data in some rows. The import, however, fails for every single row, which is impossible. I am positive that there is data in that column. Any other suggestions? Cheers ahammad wrote: > > Hello all, > > I've been having this issue for a while now. I am indexing a Sybase > database. Everything is fantastic, except that there is 1 column that I > can never get back. I don't have direct database access via Sybase client, > but I was able to extract the data using some Java code. > > The field is essentially a Last Modified field. In the DB I believe that > it is of type long. In the Java program that I have, I am able to retrieve > the data that is in that column and put it in a variable of type Long. > This is not the case in Solr, however. > > I set the variable in the schema as required to see why the data is never > stored: > required="true"/> > > This is what I get in the Tomcat logs: > > org.apache.solr.common.SolrException: Document [00069391] missing required > field: lastModified > at > org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:292) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) > at > org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:67) > at > org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:276) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:373) > at > org.apache.solr.handler.dataimport.DocBuilder.doF
Problem with retrieving field from database using DIH
Hello all, I've been having this issue for a while now. I am indexing a Sybase database. Everything is fantastic, except that there is 1 column that I can never get back. I don't have direct database access via Sybase client, but I was able to extract the data using some Java code. The field is essentially a Last Modified field. In the DB I believe that it is of type long. In the Java program that I have, I am able to retrieve the data that is in that column and put it in a variable of type Long. This is not the case in Solr, however. I set the variable in the schema as required to see why the data is never stored: This is what I get in the Tomcat logs: org.apache.solr.common.SolrException: Document [00069391] missing required field: lastModified at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:292) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:67) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:276) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) >From what I can gather, it is not finding the data and/or column, and thus cannot populate the required field. However, the data is there, which I was able to prove outside of Solr. Is there a way to generate more descriptive logs for this? I am completely lost. I hit this problem a few months ago but I was never able to resolve it. Any help on this will be much appreciated. BTW, Solr was successful in retrieving data from other columns in the same table... Thanks -- View this message in context: http://www.nabble.com/Problem-with-retrieving-field-from-database-using-DIH-tp24746530p24746530.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about formatting the results returned from Solr
Yes, I get that. The problem arises when you have multiple authors. How can I know which first name goes with which user id etc... Cheers Noble Paul നോബിള് नोब्ळ्-2 wrote: > > apparently all the dat ais going to one field 'author' > > instead they should be sent to separate fields > author_fname > author_lname > author_email > > so you would get details like > > John > Doe > j...@doe.com > > > > On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote: >> >> Hi all, >> >> Not sure how good my title is, but here is a (hopefully) better >> explanation >> on what I mean. >> >> I am indexing a set of articles from a DB. Each article has an author. >> The >> author is saved in then the DB as an author ID, which is a number. >> >> There is another table in the DB with more relevant information about the >> author. Basically it has columns like: >> >> id, firstname, lastname, email, userid >> >> I set up the DIH so that it returns the userid, and it works fine: >> >> >> jdoe >> msmith >> >> >> Would it be possible to return all of the information about the author >> (first name, ...) as a subset of the results above? >> >> Here is what I mean: >> >> >> >> John >> Doe >> j...@doe.com >> >> ... >> >> >> Something similar to that at least... >> >> Not sure how descriptive I was, but any pointers would be highly >> appreciated. >> >> Cheers >> >> -- >> View this message in context: >> http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24737962.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about formatting the results returned from Solr
Hi all, Not sure how good my title is, but here is a (hopefully) better explanation on what I mean. I am indexing a set of articles from a DB. Each article has an author. The author is saved in then the DB as an author ID, which is a number. There is another table in the DB with more relevant information about the author. Basically it has columns like: id, firstname, lastname, email, userid I set up the DIH so that it returns the userid, and it works fine: jdoe msmith Would it be possible to return all of the information about the author (first name, ...) as a subset of the results above? Here is what I mean: John Doe j...@doe.com ... Something similar to that at least... Not sure how descriptive I was, but any pointers would be highly appreciated. Cheers -- View this message in context: http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr MultiCore query
Hello joe_coder, Are you using the default example docs in your queries? If so, then I see that the word "ipod" appears in a field called "name". By default, the default search field (defined in solrconfig.xml) is the field called "text". This means that when you submit a query without specifying which field to look for (using the field:query) notation, Solr automatically assumes that you are looking in the field called "text". If you change your query to q=name:ipod, you should get the results back. One way to prevent this is to change your default search field to something else. Alternatively, if you want to search on multiple fields, you can copy all those fields to the "text" field and go from there. This can be useful if for example you had a book library to search through. You may need to search on title, short summary, description etc simultaneously. You can copy all those things to the text field and then search on the text field, which contains all the information that you wanted to search on. joe_coder wrote: > > Thanks ahammad for the quick reply. > > As suggested, I am trying out multi core way of implementing the search. I > am trying out the multicore example and getting stuck at an issue. Here is > what I did and the issue I am facing > > 1) Downloaded 1.4 and started the multicore example using java > -Dsolr.solr.home=multicore -jar start.jar > > 2) There were 2 files present under example/multicore/exampledocs/ , which > I > added to 2 cores respectively. ( Totally 3 docs are present in those 2 > files > and all have the word 'ipod' in it ) > > 3) When I query using > http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q=*:*I > get all the 3 results. > > But when I query using > http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q= > *ipod* , I get no results :( > > What could be the issue ? > > Thanks! > > > On Fri, Jul 17, 2009 at 7:20 PM, ahammad wrote: > >> >> Hello, >> >> I'm not sure what the best way is to do this, but I have done something >> identical. >> >> I have the same requirements, ie several datasources. I also used SolrJ >> and >> jsp for this. The way I ended up doing it was to create a multi core >> environment, one core per datasource. When I do a query across several >> datasources, I use shards. Solr automatically returns a "hybrid" result >> set >> that way, sorted by solr's default scoring. >> >> Faceting comes in the picture when you want to show the number of >> documents >> per datasource and have the ability to narrow down the result set. The >> way >> I >> did it was to add a field called "dataSource" to all the documents, and >> injected them with a default value of the data source name (in your case, >> D1, D2 ...). You can do this by adding this in the schema: >> >> > required="true" default="D1"/> >> >> When you perform a query across multiple datasources, you will use >> shards. >> Here is an example: >> >> >> http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2&q=some >> query >> >> That will search on both cores 1 and 2. >> >> To facet on the datasource in order to be able to categorize the result >> set, >> you can simply add this snippet to the query: >> >> &facet=on&facet.field=dataSource >> >> This will return the datasources that are defined with their number of >> results for the query. >> >> Making the facet results clickable in order to narrow down the results >> can >> be achieved by adding a filter to the query and filtering to a specific >> dataSource. I actually ended uo creating a fairly intuitive front-end for >> my >> system with faceting, filtering, paging etc all using jsp and SolrJ. >> SolrJ >> is powerful enough to handle all of the backend processing. >> >> Good luck! >> >> >> >> >> >> >> joe_coder wrote: >> > >> > I missed adding some size related information in the query above. >> > >> > D1 and D2 would have close to 1 million records each >> > D3 would have ~10 million records. >> > >> > Thanks! >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Solr-MultiCore-query-tp24534383p24534793.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Solr-MultiCore-query-tp24534383p24539215.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr MultiCore query
Hello, I'm not sure what the best way is to do this, but I have done something identical. I have the same requirements, ie several datasources. I also used SolrJ and jsp for this. The way I ended up doing it was to create a multi core environment, one core per datasource. When I do a query across several datasources, I use shards. Solr automatically returns a "hybrid" result set that way, sorted by solr's default scoring. Faceting comes in the picture when you want to show the number of documents per datasource and have the ability to narrow down the result set. The way I did it was to add a field called "dataSource" to all the documents, and injected them with a default value of the data source name (in your case, D1, D2 ...). You can do this by adding this in the schema: When you perform a query across multiple datasources, you will use shards. Here is an example: http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2&q=some query That will search on both cores 1 and 2. To facet on the datasource in order to be able to categorize the result set, you can simply add this snippet to the query: &facet=on&facet.field=dataSource This will return the datasources that are defined with their number of results for the query. Making the facet results clickable in order to narrow down the results can be achieved by adding a filter to the query and filtering to a specific dataSource. I actually ended uo creating a fairly intuitive front-end for my system with faceting, filtering, paging etc all using jsp and SolrJ. SolrJ is powerful enough to handle all of the backend processing. Good luck! joe_coder wrote: > > I missed adding some size related information in the query above. > > D1 and D2 would have close to 1 million records each > D3 would have ~10 million records. > > Thanks! > -- View this message in context: http://www.nabble.com/Solr-MultiCore-query-tp24534383p24534793.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing rich documents from websites using ExtractingRequestHandler
Hello, I can index rich documents like pdf for instance that are on the filesystem. Can we use ExtractingRequestHandler to index files that are accessible on a website? For example, there is a file that can be reached like so: http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf How would I go about indexing that file? I tried using the following combinations. I will put the errors in brackets: stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The filename, directory name, or volume label syntax is incorrect) stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot find the path specified) stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format of the specified network name is invalid) stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot find the path specified) stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network path was not found) I sort of understand why I get those errors. What are the alternative methods of doing this? I am guessing that the stream.file attribute doesn't support web addresses. Is there another attribute that does? -- View this message in context: http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html Sent from the Solr - User mailing list archive at Nabble.com.
Question regarding ExtractingRequestHandler
Hello, I've recently started using this handler to index MS Word and PDF files. When I set ext.extract.only=true, I get back all the metadata that is associated with that file. If I want to index, I need to set ext.extract.only=false. If I want to index all that metadata along with the contents, what inputs do I need to pass to the http request? Do I have to specifically define all the fields in the schema or can Solr dynamically generate those fields? Thanks. -- View this message in context: http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Installing a patch in a solr nightly on Windows
When I go to the source and I input the command, I get: bash: patch: command not found Thanks Koji Sekiguchi-2 wrote: > > ahammad wrote: >> Thanks for the suggestions: >> >> Koji: I am aware of Cygwin. The problem is I am not sure how to do the >> whole >> thing. I downloaded a nightly zip file and extracted it to a directory. >> Where do I put the .patch file? Where do I execute the "patch..." command >> from? It doesn't work when I do it at the root of the install. >> >> > It should work at the root of the install: > > $ patch -p0 < SOLR-284.patch > > Do you see an error message? What's error? > > Koji > > > > -- View this message in context: http://www.nabble.com/Installing-a-patch-in-a-solr-nightly-on-Windows-tp24273921p24307414.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Installing a patch in a solr nightly on Windows
Thanks for the suggestions: Koji: I am aware of Cygwin. The problem is I am not sure how to do the whole thing. I downloaded a nightly zip file and extracted it to a directory. Where do I put the .patch file? Where do I execute the "patch..." command from? It doesn't work when I do it at the root of the install. Michael: I'll take a look at that standalone utility. Paul: I assume that in order to do it with svn, you need to checkout the trunk? What do you do after that? Do you have the link to the distributions? I get "OPTIONS of 'http://svn.apache.org/repos/asf/lucene/solr/trunk': could not connect to server (http://svn.apache.org)" when I try. Something tells me that my proxy is blocking the connection. If that is the case, then I don't think that I can do a checkout. Do you have any other alternatives? Thanks again for the input. ahammad wrote: > > Hello, > > I am trying to install a patch for Solr > (https://issues.apache.org/jira/browse/SOLR-284) but I'm not sure how to > do it in Windows. > > I have a copy of the nightly build, but I don't know how to proceed. I > looked at the HowToContribute wiki for patch installation instructions, > but there are no Windows specific instructions in there. > > Any help would be greatly appreciated. > > Thanks > -- View this message in context: http://www.nabble.com/Installing-a-patch-in-a-solr-nightly-on-Windows-tp24273921p24306501.html Sent from the Solr - User mailing list archive at Nabble.com.
Installing a patch in a solr nightly on Windows
Hello, I am trying to install a patch for Solr (https://issues.apache.org/jira/browse/SOLR-284) but I'm not sure how to do it in Windows. I have a copy of the nightly build, but I don't know how to proceed. I looked at the HowToContribute wiki for patch installation instructions, but there are no Windows specific instructions in there. Any help would be greatly appreciated. Thanks -- View this message in context: http://www.nabble.com/Installing-a-patch-in-a-solr-nightly-on-Windows-tp24273921p24273921.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using SolrJ with multicore/shards
Sorry for the additional message, the disclaimer was missing. Disclaimer: The code that was used was taken from the following site: http://e-mats.org/2008/04/using-solrj-a-short-guide-to-getting-started-with-solrj/ . ahammad wrote: > > Hello, > > I played around some more with it and I found out that I was pointing my > constructor to an older class that doesn't have the MultiCore capability. > > This is what I did to set up the shards: > > query.setParam("shards", > "localhost:8080/solr/core0/,localhost:8080/solr/core1/"); > > I do have a new issue with this though. Here is how the results are > displayed: > >QueryResponse qr = server.query(query); > > SolrDocumentList sdl = qr.getResults(); > > System.out.println("Found: " + sdl.getNumFound()); > System.out.println("Start: " + sdl.getStart()); > System.out.println("Max Score: " + sdl.getMaxScore()); > System.out.println(""); > > ArrayList> hitsOnPage = new > ArrayList>(); > > for(SolrDocument d : sdl) > { > > HashMap values = new HashMap Object>(); > > for(Iterator> i = d.iterator(); > i.hasNext(); ) > { > Map.Entry e2 = i.next(); > > values.put(e2.getKey(), e2.getValue()); > } > > hitsOnPage.add(values); > > String outputString = new String( values.get("title") ); > System.out.println(outputString); > } > > The field "title" is one of the common fields that is shared between the > two schemas. When I print the results of my query, I get null for > everything. However, the result of sdl.getNumFound() is correct, so I know > that both cores are being accessed. > > Is there a difference with how SolrJ handles multicore requests? > > Disclaimer: The code > > > > ahammad wrote: >> >> Hello, >> >> I have a MultiCore install of solr with 2 cores with different schemas >> and such. Querying directly using http request and/or the solr interface >> works very well for my purposes. >> >> I want to have a proper search interface though, so I have some code that >> basically acts as a link between the server and the front-end. Basically, >> depending on the options, the search string is built, and when the search >> is submitted, that string gets passed as an http request. The code then >> would parse through the xml to get the information. >> >> This method works with shards because I can add the shards parameter >> straight into the link that I end up hitting. Although this is currently >> functional, I was thinking of using SolrJ simply because it is simpler to >> use and would cut down the amount of code. >> >> The question is, how would I be able to define the shards in my query, so >> that when I do search, I hit both shards and get mixed results back? >> Using http requests, it's as simple as adding a shard=core0,core1 >> snippet. What is the equivalent of this in SolrJ? >> >> BTW, I do have some SolrJ code that is able to query and return results, >> but for a single core. I am currently using CommonsHttpSolrServer for >> that, not the Embedded one. >> >> Cheers >> > > -- View this message in context: http://www.nabble.com/Using-SolrJ-with-multicore-shards-tp23834518p23838988.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using SolrJ with multicore/shards
Hello, I played around some more with it and I found out that I was pointing my constructor to an older class that doesn't have the MultiCore capability. This is what I did to set up the shards: query.setParam("shards", "localhost:8080/solr/core0/,localhost:8080/solr/core1/"); I do have a new issue with this though. Here is how the results are displayed: QueryResponse qr = server.query(query); SolrDocumentList sdl = qr.getResults(); System.out.println("Found: " + sdl.getNumFound()); System.out.println("Start: " + sdl.getStart()); System.out.println("Max Score: " + sdl.getMaxScore()); System.out.println(""); ArrayList> hitsOnPage = new ArrayList>(); for(SolrDocument d : sdl) { HashMap values = new HashMap(); for(Iterator> i = d.iterator(); i.hasNext(); ) { Map.Entry e2 = i.next(); values.put(e2.getKey(), e2.getValue()); } hitsOnPage.add(values); String outputString = new String( values.get("title") ); System.out.println(outputString); } The field "title" is one of the common fields that is shared between the two schemas. When I print the results of my query, I get null for everything. However, the result of sdl.getNumFound() is correct, so I know that both cores are being accessed. Is there a difference with how SolrJ handles multicore requests? Disclaimer: The code ahammad wrote: > > Hello, > > I have a MultiCore install of solr with 2 cores with different schemas and > such. Querying directly using http request and/or the solr interface works > very well for my purposes. > > I want to have a proper search interface though, so I have some code that > basically acts as a link between the server and the front-end. Basically, > depending on the options, the search string is built, and when the search > is submitted, that string gets passed as an http request. The code then > would parse through the xml to get the information. > > This method works with shards because I can add the shards parameter > straight into the link that I end up hitting. Although this is currently > functional, I was thinking of using SolrJ simply because it is simpler to > use and would cut down the amount of code. > > The question is, how would I be able to define the shards in my query, so > that when I do search, I hit both shards and get mixed results back? Using > http requests, it's as simple as adding a shard=core0,core1 snippet. What > is the equivalent of this in SolrJ? > > BTW, I do have some SolrJ code that is able to query and return results, > but for a single core. I am currently using CommonsHttpSolrServer for > that, not the Embedded one. > > Cheers > -- View this message in context: http://www.nabble.com/Using-SolrJ-with-multicore-shards-tp23834518p23838351.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using SolrJ with multicore/shards
I'm still not sure what you meant. I took a look at that class but I haven't got any idea on how to proceed. BTW I tried something like this query.setParam("shard", "http://localhost:8080/solr/core0/"; , "http://localhost:8080/solr/core1/";); But it doesn't seem to work for me. I tried it with different variations too, like removing the http://, and combining both cores as a single string. Could you please clarify your suggestion? Regards Otis Gospodnetic wrote: > > > You should be able to set any name=value URL parameter pair and send it to > Solr using SolrJ. What's the name of that class... MapSolrParams, I > believe. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: ahammad >> To: solr-user@lucene.apache.org >> Sent: Tuesday, June 2, 2009 11:06:55 AM >> Subject: Using SolrJ with multicore/shards >> >> >> Hello, >> >> I have a MultiCore install of solr with 2 cores with different schemas >> and >> such. Querying directly using http request and/or the solr interface >> works >> very well for my purposes. >> >> I want to have a proper search interface though, so I have some code that >> basically acts as a link between the server and the front-end. Basically, >> depending on the options, the search string is built, and when the search >> is >> submitted, that string gets passed as an http request. The code then >> would >> parse through the xml to get the information. >> >> This method works with shards because I can add the shards parameter >> straight into the link that I end up hitting. Although this is currently >> functional, I was thinking of using SolrJ simply because it is simpler to >> use and would cut down the amount of code. >> >> The question is, how would I be able to define the shards in my query, so >> that when I do search, I hit both shards and get mixed results back? >> Using >> http requests, it's as simple as adding a shard=core0,core1 snippet. What >> is >> the equivalent of this in SolrJ? >> >> BTW, I do have some SolrJ code that is able to query and return results, >> but >> for a single core. I am currently using CommonsHttpSolrServer for that, >> not >> the Embedded one. >> >> Cheers >> -- >> View this message in context: >> http://www.nabble.com/Using-SolrJ-with-multicore-shards-tp23834518p23834518.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Using-SolrJ-with-multicore-shards-tp23834518p23836485.html Sent from the Solr - User mailing list archive at Nabble.com.
Using SolrJ with multicore/shards
Hello, I have a MultiCore install of solr with 2 cores with different schemas and such. Querying directly using http request and/or the solr interface works very well for my purposes. I want to have a proper search interface though, so I have some code that basically acts as a link between the server and the front-end. Basically, depending on the options, the search string is built, and when the search is submitted, that string gets passed as an http request. The code then would parse through the xml to get the information. This method works with shards because I can add the shards parameter straight into the link that I end up hitting. Although this is currently functional, I was thinking of using SolrJ simply because it is simpler to use and would cut down the amount of code. The question is, how would I be able to define the shards in my query, so that when I do search, I hit both shards and get mixed results back? Using http requests, it's as simple as adding a shard=core0,core1 snippet. What is the equivalent of this in SolrJ? BTW, I do have some SolrJ code that is able to query and return results, but for a single core. I am currently using CommonsHttpSolrServer for that, not the Embedded one. Cheers -- View this message in context: http://www.nabble.com/Using-SolrJ-with-multicore-shards-tp23834518p23834518.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about field types and querying
Hello, I have a field type of "text" in my collection called "question". When I query for the word "customer" for example in the "question" field (ie q=question:customer), the first document with the highest score shows up, but does not contain the word customer at all. Instead, it contains the word "customize". What would be a way around this? I tried changing the type to string instead of text, but that I wouldn't get any results if I don't have the exact statement in there... -- View this message in context: http://www.nabble.com/Question-about-field-types-and-querying-tp23768061p23768061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems getting up and running.
Hello, In the solrconfig.xml file, there is a property: ${solr.data.dir:./solr/data} Try setting something else in here and see what happens...I'm not sure how solr works with Ubuntu, but it's worth a shot... Tim Haughton wrote: > > OK, I spoke too soon. > > When you tried it on your Mac, did it create the index in the right place? > Mine is still trying to create it under the webapps directory. > > Cheers, > > Tim > > 2009/5/28 Tim Haughton > >> 2009/5/28 Koji Sekiguchi >> >>> >>> Ok. >>> I've just tried it (the way you quoted above) on my Mac and worked >>> fine... >>> Do you see any errors on Tomcat log when starting? >>> >> >> Sussed it. As you would imagine it was the stupidest of things. And >> probably the *one* thing left out of my description. My solr.xml file had >> an >> illegal character at the top of the file. I hadn't noticed it until the >> error logs pushed me in the right direction. Thanks for the pointer. >> >> Cheers, >> >> Tim >> >> >> > > -- View this message in context: http://www.nabble.com/Problems-getting-up-and-running.-tp23758840p23761217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing from DB connection issue
Hello, I tried your suggestion, and it still gives me the same error. I'd like to point out again that the same folder/config setup is running on my machine with no issues, but it gives me that stack trace in the logs on the server. When I do the full data import request through the browser, I get this: − 0 0 − − data-config.xml full-import idle − 0:0:1.329 1 0 0 0 2009-05-27 09:42:24 − This response format is experimental. It is likely to change in the future. Refreshing the page usually results in requests to datasource/rows fetched etc numbers to increase. In my case the request to datasource stays at 1 regardless. Looks like it tries once and fails, then it terminates the process... Regards Noble Paul നോബിള് नोब्ळ्-2 wrote: > > no need to rename . > > On Wed, May 27, 2009 at 6:50 PM, ahammad wrote: >> >> Would I need to rename it or refer to it somewhere? Or can I keep the >> existing name (apache-solr-dataimporthandler-1.4-dev.jar)? >> >> Cheers >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> take the trunk dih.jar. use winzip/winrar or any tool and just delete >>> all the files other than ClobTransformer.class. put that jar into >>> solr.home/lib >>> >>> On Wed, May 27, 2009 at 6:10 PM, ahammad wrote: >>>> >>>> Hmmm, that's probably a good idea...although it does not explain how my >>>> current local setup works. >>>> >>>> Can you please explain how this is done? I am assuming that I need to >>>> add >>>> the class itself to the source of solr 1.3, and then compile the code, >>>> and >>>> take the new .war file and put it in Tomcat? If that is correct, where >>>> in >>>> the source folders would the ClobTransformer.class file go? >>>> >>>> Thanks. >>>> >>>> >>>> >>>> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>>>> >>>>> I guess it is better to copy the ClobTransformer.class alone and use >>>>> the old Solr1.3 DIH >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, May 26, 2009 at 11:50 PM, ahammad >>>>> wrote: >>>>>> >>>>>> I have an update: >>>>>> >>>>>> I played around with it some more and it seems like it's being caused >>>>>> by >>>>>> the >>>>>> ClobTransformer. If I remove the 'clob="true"' from the field part in >>>>>> the >>>>>> data-config, it works fine. >>>>>> >>>>>> The Solr install is a multicore one. I placed the >>>>>> apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in >>>>>> the >>>>>> {solrHome}/core1/lib directory (I only need it for the first core). >>>>>> Is >>>>>> there >>>>>> something else I need to do for it to work? >>>>>> >>>>>> I don't recall doing an additional step when I did this a few weeks >>>>>> ago >>>>>> on >>>>>> my local machine. >>>>>> >>>>>> Any help is appreciated. >>>>>> >>>>>> Regards >>>>>> >>>>>> >>>>>> ahammad wrote: >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>> I am tyring to index directly from an Oracle DB. This is what >>>>>>> appears >>>>>>> in >>>>>>> the stack trace: >>>>>>> >>>>>>> SEVERE: Full Import failed >>>>>>> org.apache.solr.handler.dataimport.DataImportHandlerException: >>>>>>> Unable >>>>>>> to >>>>>>> execute query: select * from ARTICLE Processing Document # 1 >>>>>>> at >>>>>>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) >>>>>>> at >>>>>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) >>>>>>> at >>>>>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) >>>>>>> at >>>>>>> org.apache.solr.handler.dataimport.Sq
Re: Indexing from DB connection issue
Would I need to rename it or refer to it somewhere? Or can I keep the existing name (apache-solr-dataimporthandler-1.4-dev.jar)? Cheers Noble Paul നോബിള് नोब्ळ्-2 wrote: > > take the trunk dih.jar. use winzip/winrar or any tool and just delete > all the files other than ClobTransformer.class. put that jar into > solr.home/lib > > On Wed, May 27, 2009 at 6:10 PM, ahammad wrote: >> >> Hmmm, that's probably a good idea...although it does not explain how my >> current local setup works. >> >> Can you please explain how this is done? I am assuming that I need to add >> the class itself to the source of solr 1.3, and then compile the code, >> and >> take the new .war file and put it in Tomcat? If that is correct, where in >> the source folders would the ClobTransformer.class file go? >> >> Thanks. >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> I guess it is better to copy the ClobTransformer.class alone and use >>> the old Solr1.3 DIH >>> >>> >>> >>> >>> >>> On Tue, May 26, 2009 at 11:50 PM, ahammad >>> wrote: >>>> >>>> I have an update: >>>> >>>> I played around with it some more and it seems like it's being caused >>>> by >>>> the >>>> ClobTransformer. If I remove the 'clob="true"' from the field part in >>>> the >>>> data-config, it works fine. >>>> >>>> The Solr install is a multicore one. I placed the >>>> apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in >>>> the >>>> {solrHome}/core1/lib directory (I only need it for the first core). Is >>>> there >>>> something else I need to do for it to work? >>>> >>>> I don't recall doing an additional step when I did this a few weeks ago >>>> on >>>> my local machine. >>>> >>>> Any help is appreciated. >>>> >>>> Regards >>>> >>>> >>>> ahammad wrote: >>>>> >>>>> Hello all, >>>>> >>>>> I am tyring to index directly from an Oracle DB. This is what appears >>>>> in >>>>> the stack trace: >>>>> >>>>> SEVERE: Full Import failed >>>>> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable >>>>> to >>>>> execute query: select * from ARTICLE Processing Document # 1 >>>>> at >>>>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) >>>>> at >>>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) >>>>> at >>>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) >>>>> at >>>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >>>>> at >>>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) >>>>> at >>>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >>>>> at >>>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>>> at >>>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>>> at >>>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>>> at >>>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>>> at >>>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>>>> Caused by: java.sql.SQLException: Closed Connection >>>>> at >>>>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) >>>>> at >>>>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) >>>>> at >>>>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) >>>>> at >>>>> oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) >>>>> at >>>>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDat
Re: Indexing from DB connection issue
Hmmm, that's probably a good idea...although it does not explain how my current local setup works. Can you please explain how this is done? I am assuming that I need to add the class itself to the source of solr 1.3, and then compile the code, and take the new .war file and put it in Tomcat? If that is correct, where in the source folders would the ClobTransformer.class file go? Thanks. Noble Paul നോബിള് नोब्ळ्-2 wrote: > > I guess it is better to copy the ClobTransformer.class alone and use > the old Solr1.3 DIH > > > > > > On Tue, May 26, 2009 at 11:50 PM, ahammad wrote: >> >> I have an update: >> >> I played around with it some more and it seems like it's being caused by >> the >> ClobTransformer. If I remove the 'clob="true"' from the field part in the >> data-config, it works fine. >> >> The Solr install is a multicore one. I placed the >> apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in the >> {solrHome}/core1/lib directory (I only need it for the first core). Is >> there >> something else I need to do for it to work? >> >> I don't recall doing an additional step when I did this a few weeks ago >> on >> my local machine. >> >> Any help is appreciated. >> >> Regards >> >> >> ahammad wrote: >>> >>> Hello all, >>> >>> I am tyring to index directly from an Oracle DB. This is what appears in >>> the stack trace: >>> >>> SEVERE: Full Import failed >>> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to >>> execute query: select * from ARTICLE Processing Document # 1 >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) >>> at >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >>> at >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>> at >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>> Caused by: java.sql.SQLException: Closed Connection >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) >>> at >>> oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) >>> ... 10 more >>> >>> Funny thing is, the data import works on my local machine. I moved all >>> the >>> config files to another server, and I get this. I reindexed on my local >>> machine immediately after in order to verify that the DB works, and it >>> indexes fine. >>> >>> Here is my data-config file, just in case: >>> >>> >>> >> user="xxx" password="xxx"/> >>> >>> >> transformer="ClobTransformer"> >>> >>> >> clob="true" /> >>> >>> >>> >> query="select ID_A from ARTICLE_AUTHOR >>> where ID_A='${ARTICLE.ID}'"> >>> >> name="author" /> >>> >>> >>> >>> >>> >>> >>> I am using the 1.3 release version, with the 1.4 DIH jar file for the >>> Clob >>> Transformer. What could be causing this? >>> >>> Cheers >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23728596.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23741712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Solr not returning expects results from search
I too am using 1.3. They way you specified shards is correct. For instance, I normally make the request to core0, and in the shards parameter, I put the addresses of both core0 and core1. I am using Tomcat though, so that may be different... Is there anything in the logs that strikes you as odd when you query across multiple shards? KennyN wrote: > > Thanks for the reply ahammad, that helps. Are you specifying them both in > a URL, or in the name="shards">localhost:8983/solr/core0,localhost:8983/solr/core1 > like I have? > > I should add that I now have two indices that have different data in them. > That is to say the ids are unique across both shards and I am still seeing > this issue... > > I should also note that this is Solr 1.3, I don't think I mentioned that > before. > > > ahammad wrote: >> >> I have a multicore setup as well, and when I query something, I do it >> through core0, then specify both core0 and core1 ins the "shards" >> parameter. >> >> However, I don't have identical indicies. The results I get back are >> basically and addition of both cores' results. >> >> Good luck, please reply to this message if you have it figured out, I am >> curious to know what's going on. >> >> Regards >> > > -- View this message in context: http://www.nabble.com/Multicore-Solr-not-returning-expects-results-from-search-tp23623975p23730420.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Solr not returning expects results from search
I have a multicore setup as well, and when I query something, I do it through core0, then specify both core0 and core1 ins the "shards" parameter. However, I don't have identical indicies. The results I get back are basically and addition of both cores' results. Good luck, please reply to this message if you have it figured out, I am curious to know what's going on. Regards KennyN wrote: > > I am still trying to figure this out... I am thinking maybe I have the > shards setup wrong? If I have core0 and core1 with indices, and then I run > the query on core0, specifying shards of core0 and core1. Is this how I > should be doing it? Or should I have another core just to specify the > other shards? > -- View this message in context: http://www.nabble.com/Multicore-Solr-not-returning-expects-results-from-search-tp23623975p23729379.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing from DB connection issue
I have an update: I played around with it some more and it seems like it's being caused by the ClobTransformer. If I remove the 'clob="true"' from the field part in the data-config, it works fine. The Solr install is a multicore one. I placed the apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in the {solrHome}/core1/lib directory (I only need it for the first core). Is there something else I need to do for it to work? I don't recall doing an additional step when I did this a few weeks ago on my local machine. Any help is appreciated. Regards ahammad wrote: > > Hello all, > > I am tyring to index directly from an Oracle DB. This is what appears in > the stack trace: > > SEVERE: Full Import failed > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: select * from ARTICLE Processing Document # 1 > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) > Caused by: java.sql.SQLException: Closed Connection > at > oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) > at > oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) > at > oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) > at > oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) > ... 10 more > > Funny thing is, the data import works on my local machine. I moved all the > config files to another server, and I get this. I reindexed on my local > machine immediately after in order to verify that the DB works, and it > indexes fine. > > Here is my data-config file, just in case: > > > user="xxx" password="xxx"/> > > transformer="ClobTransformer"> > > > > > > > > > > > > > I am using the 1.3 release version, with the 1.4 DIH jar file for the Clob > Transformer. What could be causing this? > > Cheers > -- View this message in context: http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23728596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing from DB connection issue
Hello Erik, Yes, the drivers are there. I forgot to mention, this DB indexing was working before on the server when the DB was using a different schema. The schema has changed, so I did all my testing on my local machine. When I saw that it worked fine, I put in the new connection string/user/pass and tried it on the server... Erik Hatcher wrote: > > Did you move the Oracle JDBC driver to the other machine also? > > Erik > > On May 26, 2009, at 11:37 AM, ahammad wrote: > >> >> Hello all, >> >> I am tyring to index directly from an Oracle DB. This is what >> appears in the >> stack trace: >> >> SEVERE: Full Import failed >> org.apache.solr.handler.dataimport.DataImportHandlerException: >> Unable to >> execute query: select * from ARTICLE Processing Document # 1 >> at >> org.apache.solr.handler.dataimport.JdbcDataSource >> $ResultSetIterator.(JdbcDataSource.java:186) >> at >> org >> .apache >> .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java: >> 143) >> at >> org >> .apache >> .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java: >> 43) >> at >> org >> .apache >> .solr >> .handler >> .dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >> at >> org >> .apache >> .solr >> .handler >> .dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) >> at >> org >> .apache >> .solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >> at >> org >> .apache >> .solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >> at >> org >> .apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java: >> 136) >> at >> org >> .apache >> .solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java: >> 334) >> at >> org >> .apache >> .solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >> at >> org.apache.solr.handler.dataimport.DataImporter >> $1.run(DataImporter.java:377) >> Caused by: java.sql.SQLException: Closed Connection >> at >> oracle >> .jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) >> at >> oracle >> .jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) >> at >> oracle >> .jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) >> at >> oracle >> .jdbc >> .driver.PhysicalConnection.createStatement(PhysicalConnection.java: >> 755) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource >> $ResultSetIterator.(JdbcDataSource.java:174) >> ... 10 more >> >> Funny thing is, the data import works on my local machine. I moved >> all the >> config files to another server, and I get this. I reindexed on my >> local >> machine immediately after in order to verify that the DB works, and it >> indexes fine. >> >> Here is my data-config file, just in case: >> >> >>> user="xxx" password="xxx"/> >> >>> transformer="ClobTransformer"> >> >> >> >> >> >> >> >> >> >> >> >> >> I am using the 1.3 release version, with the 1.4 DIH jar file for >> the Clob >> Transformer. What could be causing this? >> >> Cheers >> -- >> View this message in context: >> http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23725712.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23727121.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing from DB connection issue
Hello all, I am tyring to index directly from an Oracle DB. This is what appears in the stack trace: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from ARTICLE Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Caused by: java.sql.SQLException: Closed Connection at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) at oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) ... 10 more Funny thing is, the data import works on my local machine. I moved all the config files to another server, and I get this. I reindexed on my local machine immediately after in order to verify that the DB works, and it indexes fine. Here is my data-config file, just in case: I am using the 1.3 release version, with the 1.4 DIH jar file for the Clob Transformer. What could be causing this? Cheers -- View this message in context: http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23725712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique Identifiers
Hello, How would I go about creating an aggregate entry? Does it go in the data-config.xml file? Also, out of curiosity, how can I access the UUIDField variable? It mat be required for something else. Cheers Erik Hatcher wrote: > > > On Apr 28, 2009, at 9:49 AM, ahammad wrote: >> Is it possible for Solr to assign a unique number to every document? > > Solr has a UUIDField that can be used for this. But... > >> For example, let's say that I am indexing from several databases with >> different data structures. The first one has a unique field called >> artID, >> and the second database has a unique field called SRNum. If I want >> to have >> an interface that allows me to search both of those data sources, it >> makes >> it easier to have a single field per document that is common to both >> datasources...maybe something like uniqueDocID or something like that. >> >> That field does not exist in the DB. Is it possible for Solr to >> create that >> field and assign a number while it's indexing? > > I recommend an aggregate unique key field, using maybe this scheme: > > -' > > Erik > > > -- View this message in context: http://www.nabble.com/Unique-Identifiers-tp23277538p23318361.html Sent from the Solr - User mailing list archive at Nabble.com.
Importing data from Sybase
Hello, I'm trying to index data from a Sybase DB, but when I attempt to do a full import, it fails. This is in the log: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.AbstractMethodError: com.sybase.jdbc2.jdbc.SybConnection.setHoldability(I)V at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.AbstractMethodError: com.sybase.jdbc2.jdbc.SybConnection.setHoldability(I)V at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:181) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:361) at org.apache.solr.handler.dataimport.JdbcDataSource.access$300(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:237) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:207) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335) This seems to me like a Sybase issue but I'm unsure. Is Solr designed to be compatible with Sybase or has it had issues in the past? My data-config.xml file is pretty much the same as the one I have for an Oracle DB, except with the appropriate changes made to driver/url/user/password fields. Cheers -- View this message in context: http://www.nabble.com/Importing-data-from-Sybase-tp23284464p23284464.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to import data from database
Did you define all the fields that you used in schema.xml? Ci-man wrote: > > I am using MS SQL server and want to index a table. > I setup my data-config like this: > > > autoCommit="true" > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" > url="jdbc:sqlserver://localhost:1433;databaseName=MYDB" > user="" password=""/> > > > > > > > > > > > > > > I am unable to load data from database. I always receive 0 document > fetched: > > 0:0:12.989 > 1 > 0 > 0 > 0 > 2009-04-28 14:37:49 > > > The query runs in SQL Server query manager and retrieves records. The > funny thing is, even if I purposefully write a wrong query with > non-existing tables I get the same response. What am I doing wrong? How > can I tell whether a query fails or succeeds or if solr is running the > query in the first place? > > Any help is appreciated. > Best, > -Ci > > > -- View this message in context: http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23284381.html Sent from the Solr - User mailing list archive at Nabble.com.
Unique Identifiers
Hello all, Is it possible for Solr to assign a unique number to every document? For example, let's say that I am indexing from several databases with different data structures. The first one has a unique field called artID, and the second database has a unique field called SRNum. If I want to have an interface that allows me to search both of those data sources, it makes it easier to have a single field per document that is common to both datasources...maybe something like uniqueDocID or something like that. That field does not exist in the DB. Is it possible for Solr to create that field and assign a number while it's indexing? Cheers -- View this message in context: http://www.nabble.com/Unique-Identifiers-tp23277538p23277538.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing from a DB, corrupt Lucene index
Excuse the error in the title. It should say "missing Lucene index" Cheers ahammad wrote: > > Hello, > > I finally was able to run a full import on an Oracle database. According > to the statistics, it looks like it fetched all the rows from the table. > However, When I go into /data, there is nothing in there. > > This is my data-config.xml file: > > > user="" password=""/> > > > > > > > > > > > > > > > I added all the relevant fileds in the schema.xml file. From the interface > when I do dataimport?command=full-import, it says that "n rows were > fetched", where n is the actual number of rows in the DB table. Everything > looks great from there, but there is nothing in my data folder. In > solrconfig.xml, the line that defines the location where data is stored > is: > > ${solr.data.dir:./solr/data} > > What am I missing exactly? BTW, the Tomcat logs don't show errors or > anything like that. > > Cheers and Thank you. > -- View this message in context: http://www.nabble.com/Indexing-from-a-DB%2C-corrupt-Lucene-index-tp23175796p23175805.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing from a DB, corrupt Lucene index
Hello, I finally was able to run a full import on an Oracle database. According to the statistics, it looks like it fetched all the rows from the table. However, When I go into /data, there is nothing in there. This is my data-config.xml file: I added all the relevant fileds in the schema.xml file. From the interface when I do dataimport?command=full-import, it says that "n rows were fetched", where n is the actual number of rows in the DB table. Everything looks great from there, but there is nothing in my data folder. In solrconfig.xml, the line that defines the location where data is stored is: ${solr.data.dir:./solr/data} What am I missing exactly? BTW, the Tomcat logs don't show errors or anything like that. Cheers and Thank you. -- View this message in context: http://www.nabble.com/Indexing-from-a-DB%2C-corrupt-Lucene-index-tp23175796p23175796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr to index a database
Thanks for the link... I'm still a bit unclear as to how it goes. For example, lets say i have a table called PRODUCTS, and within that table, I have the following columns: NUMBER (product number) NAME (product name) PRICE How would I index all this information? Here is an example (from the links you provided) of xml that confuses me: deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> What is that deltaQuery (or even if it was a regular "query" expression) line for? It seems to me like a sort of filter. What if I don't want to filter anything and just want to index all the rows? Cheers Noble Paul നോബിള് नोब्ळ् wrote: > > On Mon, Apr 20, 2009 at 7:15 PM, ahammad wrote: >> >> Hello, >> >> I've never used Solr before, but I believe that it will suit my current >> needs with indexing information from a database. >> >> I downloaded and extracted Solr 1.3 to play around with it. I've been >> looking at the following tutorials: >> http://www.ibm.com/developerworks/java/library/j-solr-update/index.html >> http://www.ibm.com/developerworks/java/library/j-solr-update/index.html >> http://wiki.apache.org/solr/DataImportHandler >> http://wiki.apache.org/solr/DataImportHandler >> >> There are a few things I don't understand. For example, the IBM article >> sometimes refers to directories that aren't there, or a little different >> from what I have in my extracted copy of Solr (ie >> solr-dw/rss/conf/solrconfig.xml). I tried to follow the steps as best I >> can, >> but as soon as I put the following in solrconfig.xml, the whole thing >> breaks: >> >> > class="org.apache.solr.handler.dataimport.DataImportHandler"> >> >> rss-data-config.xml >> >> >> >> Obviously I replace with my own info...One thing I don't quite get is the >> data-config.xml file. What exactly is it? I've seen examples of what it >> contains but since I don't know enough, I couldn't really adjust it. In >> any >> case, this is the error I get, which may be because of a misconfigured >> data-config.xml... > the data-config.xml describes how to fetch data from various data > sources and index them into Solr. > > The stacktrace says that your xml is invalid. > > The best bet is to take one of the sample dataconfig xml files and make > changes. > > http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/db/conf/db-data-config.xml?revision=691151&view=markup > > http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/rss/conf/rss-data-config.xml?revision=691151&view=markup > > >> >> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception >> occurred while initializing context at >> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165) >> at >> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:99) >> at >> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:96) >> at >> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:388) >> at org.apache.solr.core.SolrCore.(SolrCore.java:571) at >> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122) >> at >> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) >> at >> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) >> at >> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) >> at >> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) >> at >> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) >> at >> org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) >> at >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) >> at >> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) >> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) >> at >> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:831) at >> org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:720) at >> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at >> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1150) at >> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
Re: Solr webinar
Hello Erik, I'm interested in attending the Webinar. I just have some questions to verify whether or not I am fit to attend... 1) How will it be carried out? What software or application would I need? 2) Do I have to have any experience or can I attend for the purpose of learning about Solr? Thanks for taking time to do this. Regards Erik Hatcher wrote: > > (excuse the cross-post) > > I'm presenting a webinar on Solr. Registration is limited, so sign up > soon. Looking forward to "seeing" some of you there! > > Thanks, > Erik > > > "Got data? You can build your own Solr-powered Search Engine!" > > Erik Hatcher, Lucene/Solr Committer and author, will show you how you > how to use Solr to build an Enterprise Search engine that indexes a > variety data sources all in a matter of minutes! > > Thursday, April 30, 2009 > 11:00AM - 12:00PM PDT / 2:00PM - 3:00PM EDT > > Sign up for this free webinar today at > http://www2.eventsvc.com/lucidimagination/?trk=E1 > > -- View this message in context: http://www.nabble.com/Solr-webinar-tp23138157p23138451.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr to index a database
For now it's unclear, as this is sort of an "experiment" to see how much we can do with it. I am inclined to use the index within Solr though, simply for the very powerful querying (the stuff I've seen at least). I am not exactly sure how much of the querying capabilities I'll require though. I'll take a look at LuSql and see if it can be used for my purposes. I want to get Solr working though, because I know that later down the road I'm going to need it for another project... Glen Newton wrote: > > You have not indicated how you wish to use the index (inside Solr or not). > > It is possible that LuSql might be an preferable alternative to > Solr/DataImportHandler, depending on your requirements. > > LuSql: http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql > > Disclaimer: I am the author of LuSql. > > -glen > > 2009/4/20 ahammad : >> >> Hello, >> >> I've never used Solr before, but I believe that it will suit my current >> needs with indexing information from a database. >> >> I downloaded and extracted Solr 1.3 to play around with it. I've been >> looking at the following tutorials: >> http://www.ibm.com/developerworks/java/library/j-solr-update/index.html >> http://www.ibm.com/developerworks/java/library/j-solr-update/index.html >> http://wiki.apache.org/solr/DataImportHandler >> http://wiki.apache.org/solr/DataImportHandler >> >> There are a few things I don't understand. For example, the IBM article >> sometimes refers to directories that aren't there, or a little different >> from what I have in my extracted copy of Solr (ie >> solr-dw/rss/conf/solrconfig.xml). I tried to follow the steps as best I >> can, >> but as soon as I put the following in solrconfig.xml, the whole thing >> breaks: >> >> > class="org.apache.solr.handler.dataimport.DataImportHandler"> >> >> rss-data-config.xml >> >> >> >> Obviously I replace with my own info...One thing I don't quite get is the >> data-config.xml file. What exactly is it? I've seen examples of what it >> contains but since I don't know enough, I couldn't really adjust it. In >> any >> case, this is the error I get, which may be because of a misconfigured >> data-config.xml... >> >> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception >> occurred while initializing context at >> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165) >> at >> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:99) >> at >> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:96) >> at >> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:388) >> at org.apache.solr.core.SolrCore.(SolrCore.java:571) at >> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122) >> at >> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) >> at >> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) >> at >> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) >> at >> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) >> at >> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) >> at >> org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) >> at >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) >> at >> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) >> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) >> at >> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:831) at >> org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:720) at >> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at >> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1150) at >> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) >> at >> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) >> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) >> at >> org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at >> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at >> org.apache.catalina.core.StandardEngine.start(
Using Solr to index a database
Hello, I've never used Solr before, but I believe that it will suit my current needs with indexing information from a database. I downloaded and extracted Solr 1.3 to play around with it. I've been looking at the following tutorials: http://www.ibm.com/developerworks/java/library/j-solr-update/index.html http://www.ibm.com/developerworks/java/library/j-solr-update/index.html http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DataImportHandler There are a few things I don't understand. For example, the IBM article sometimes refers to directories that aren't there, or a little different from what I have in my extracted copy of Solr (ie solr-dw/rss/conf/solrconfig.xml). I tried to follow the steps as best I can, but as soon as I put the following in solrconfig.xml, the whole thing breaks: rss-data-config.xml Obviously I replace with my own info...One thing I don't quite get is the data-config.xml file. What exactly is it? I've seen examples of what it contains but since I don't know enough, I couldn't really adjust it. In any case, this is the error I get, which may be because of a misconfigured data-config.xml... org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165) at org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:99) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:96) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:388) at org.apache.solr.core.SolrCore.(SolrCore.java:571) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:831) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:720) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1150) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:448) at org.apache.catalina.core.StandardServer.start(StandardServer.java:700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433) Caused by: org.xml.sax.SAXParseException: The element type "document" must be terminated by the matching end-tag "". at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:153) It's unclear to me what I need to be using, as in what directories/files I need to implement this. Can someone please point me in the right direction? BTW, I'm using Tomcat 5.5 because the prepackaged Jetty simply doesn't work for me. It shows that it "started" in the command line, but it hangs, and doesn't actually work when I try to hit the Solr admin page (page not found type error). Jetty itself does start but the project doesn't seem to deploy... I apologize for the long post and if I didn't provide as much information as I should. Let me know if you need clarification with anything I said. Thank you very much. -- View this message in context: http://www.nabble.com/Using-Solr-to-index-a-database-tp23136944p23136944.html Sent from the Solr - User mailing list archive at
Re: Integrating Solr and Nutch
Thanks for your reply Andrzej. I am very interested in learning more about this and I cannot wait to check it out. Nutch is extremely good on its own, but I want to know what else can be done with the Nutch/Solr combo. Cheers Andrzej Bialecki wrote: > > Tony Wang wrote: >> I heard Nutch 1.0 will have an easy way to integrate with Solr, but I >> haven't found any documentation on that yet. anyone? > > Indeed, this integration is already supported in Nutch trunk (soon to be > released). Please download a nightly package and test it. > > You will need to reindex your segments using the solrindex command, and > change the searcher configuration. See nutch-default.xml for details. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > -- View this message in context: http://www.nabble.com/Integrating-Solr-and-Nutch-tp22252531p22289675.html Sent from the Solr - User mailing list archive at Nabble.com.
Integrating Solr and Nutch
Hello, I'm wondering if it's possible to make Solr use a Nutch index. I used Nutch to crawl some pages and I now have an index with about 2000 documents. I want to explore the features of Solr, and since both Nutch and Solr are based off Lucene, I assume that there is some way to integrate them with one another. Has this been implemented? I am using the latest release versions of Nutch and Solr. Cheers -- View this message in context: http://www.nabble.com/Integrating-Solr-and-Nutch-tp22252531p22252531.html Sent from the Solr - User mailing list archive at Nabble.com.