[jira] [Commented] (SOLR-2822) don't run update processors twice

2012-05-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266916#comment-13266916
 ] 

Jan Høydahl commented on SOLR-2822:
---

{quote}
This idea is very similar to part of what Jan suggested in his comment 
above...SNIP...but what i'm thinking of wouldn't require named processors and 
would be specific to distributed updates (but wouldn't precluded named 
processors and more enhanced logic down the road if someone wanted it).
{quote}

Yup, that's one way. However, I think we can achieve even more transparency and 
flexibility by introducing the concept of *GOTO* in our pipeline! Remember in 
the old days of programming, we could jump to a specified place in the code 
(well, HTML's anchor does the same, but I thought GOTO was a cooler analogy :) 
) Let's say we create an interface {{ChainLabel}} with two methods 
{{getLabel/setLabel}}, same as your marker but with a nametag. Then 
DistribProcessor would set "distrib" as its label, and we could imagine a 
future processor which delegates processing to an external pipeline cluster, 
which sets another label "externalPipeline". We could even have a dummy noop 
UpdateProcessor which sets the label as a config param. Then, you could call 
{{update.chain.goto=myLabel}} to continue processing at the label. The URPChain 
class would not know about DistributedUpdateProcessor, but about labels and 
goto in general.

I like your {{update.distrib=none|toleader|fromleader}} optimization

> don't run update processors twice
> -
>
>     Key: SOLR-2822
> URL: https://issues.apache.org/jira/browse/SOLR-2822
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud, update
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> An update will first go through processors until it gets to the point where 
> it is forwarded to the leader (or forwarded to replicas if already on the 
> leader).
> We need a way to skip over the processors that were already run (perhaps by 
> using a processor chain dedicated to sub-updates?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2012-05-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266919#comment-13266919
 ] 

Jan Høydahl commented on SOLR-2487:
---

Neil, I don't think we as a project will upload multiple war's to Maven. You'll 
have to build your own using ant dist-war-excl-slf4j. Perhaps if you have a 
local repo such as Artifactory you can put a copy there, or upload it to some 
other repo that you can access.

> Do not include slf4j-jdk14 jar in WAR
> -
>
> Key: SOLR-2487
>     URL: https://issues.apache.org/jira/browse/SOLR-2487
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2, 4.0
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: logging, slf4j
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-2487.patch, SOLR-2487.patch, SOLR-2487.patch
>
>
> I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
> newbies get up and running. But I find myself re-packaging the war for every 
> customer when adapting to their choice of logger framework, which is 
> counter-productive.
> It would be sufficient to have the jdk-logging binding in example/lib to let 
> the example and tutorial still work OOTB but as soon as you deploy solr.war 
> to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3431) Make dist-war-excl-slf4j target available in the Maven repository

2012-05-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267380#comment-13267380
 ] 

Jan Høydahl commented on SOLR-3431:
---

Great. Perhaps you would like to add this instruction to our Wiki somewhere -
http://wiki.apache.org/solr/HowToContribute#Maven or a new page?

> Make dist-war-excl-slf4j target available in the Maven repository
> -
>
> Key: SOLR-3431
> URL: https://issues.apache.org/jira/browse/SOLR-3431
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.0
>Reporter: Neil Hooey
>Priority: Minor
>  Labels: maven, slf4j
>
> Since SOLR-2487 was closed, a new build target {{dist-war-excl-slf4j}} was 
> created for the Ant build, but this war file isn't in Maven yet.
> Users who want to build a Solr war with Maven and without slf4j included have 
> to expand the war file and delete the {{WEB-INF/lib/slf4j-jdk14-1.6.4.jar}} 
> file.
> With this target in Maven, expanding the war won't be necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-05-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269078#comment-13269078
 ] 

Jan Høydahl commented on SOLR-139:
--

Cool. Any plans for supporting modification of existing value? Most useful 
would be add, subtract (for numeric) and append text for textual. (In FAST ESP 
we had this as part of the partial update APIs)

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-XmlUpdater.patch, SOLR-139.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

2012-05-05 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3439:
--

Fix Version/s: 4.0

> Add "content" field to example schema to make SolrCell easier to use out of 
> the box
> ---
>
> Key: SOLR-3439
> URL: https://issues.apache.org/jira/browse/SOLR-3439
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction), Schema and 
> Analysis
>Reporter: Jack Krupansky
>Priority: Minor
> Fix For: 4.0
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a 
> document) to the "text" field which is the indexed-only (not stored) 
> catch-all for default queries. That searches fine, but doesn't show the 
> document content in the results, sometimes leading users to think that 
> something is wrong. Sure, the user can easily add the field (and this is 
> documented), but it would be a better user experience to have such a basic 
> feature work right out of the box without any config editing and without the 
> need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the 
> section of fields already defined to support SolrCell metadata. It would be 
> stored and indexed.
> I further propose that a copyField be added for the "title", "description", 
> (and maybe a couple of others) and "content" fields to add them to the "text" 
> field for searching. Again, trying to improve the out of the box user 
> experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

2012-05-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269080#comment-13269080
 ] 

Jan Høydahl commented on SOLR-3439:
---

I agree that this makes sense, and will not have any cost.

We could also make the Velocity GUI smart enough to detect whether the document 
is a "product" document, and output name, manufacturer, price, inStock etc.. OR 
whether it is a Tika doc or HTML in which case it prints the title, dynamic 
teaser, document size, document type/MIME etc.

Finally we could add some PDFs to the exampledocs folder!

Do you want to attempt a first patch?

> Add "content" field to example schema to make SolrCell easier to use out of 
> the box
> ---
>
> Key: SOLR-3439
>     URL: https://issues.apache.org/jira/browse/SOLR-3439
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction), Schema and 
> Analysis
>Reporter: Jack Krupansky
>Priority: Minor
> Fix For: 4.0
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a 
> document) to the "text" field which is the indexed-only (not stored) 
> catch-all for default queries. That searches fine, but doesn't show the 
> document content in the results, sometimes leading users to think that 
> something is wrong. Sure, the user can easily add the field (and this is 
> documented), but it would be a better user experience to have such a basic 
> feature work right out of the box without any config editing and without the 
> need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the 
> section of fields already defined to support SolrCell metadata. It would be 
> stored and indexed.
> I further propose that a copyField be added for the "title", "description", 
> (and maybe a couple of others) and "content" fields to add them to the "text" 
> field for searching. Again, trying to improve the out of the box user 
> experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

2012-05-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269305#comment-13269305
 ] 

Jan Høydahl commented on SOLR-3439:
---

Really, the copyField thing in todays example schema is an *anti pattern* since 
we teach people to duplicate all their content while most people would be 
better off using DisMax. I have had several customers who build their whole 
search on the model from example schema and then get into performance problems 
due to the 2x index increase.

How would you feel if we instead get rid of *all* the copyFields and configure 
the default handler with &defType=edismax&qf=name,features,manu,content 
Then we can leave a copyField section commented out in the schema with an 
explanation of what use cases it is good for.

> Add "content" field to example schema to make SolrCell easier to use out of 
> the box
> ---
>
> Key: SOLR-3439
> URL: https://issues.apache.org/jira/browse/SOLR-3439
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction), Schema and 
> Analysis
>Reporter: Jack Krupansky
>Priority: Minor
> Fix For: 4.0
>
> Attachments: Lincoln-Gettysburg-Address.docx, 
> Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a 
> document) to the "text" field which is the indexed-only (not stored) 
> catch-all for default queries. That searches fine, but doesn't show the 
> document content in the results, sometimes leading users to think that 
> something is wrong. Sure, the user can easily add the field (and this is 
> documented), but it would be a better user experience to have such a basic 
> feature work right out of the box without any config editing and without the 
> need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the 
> section of fields already defined to support SolrCell metadata. It would be 
> stored and indexed.
> I further propose that a copyField be added for the "title", "description", 
> (and maybe a couple of others) and "content" fields to add them to the "text" 
> field for searching. Again, trying to improve the out of the box user 
> experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread JIRA
Jan Høydahl created SOLR-3442:
-

 Summary: Example schema switch to DisMax instead of CopyField
 Key: SOLR-3442
 URL: https://issues.apache.org/jira/browse/SOLR-3442
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jan Høydahl


Spinoff from SOLR-3439:

The use of copyField in todays example schema is an anti pattern since we 
indirectly teach people to duplicate most of their content, while most would be 
better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

2012-05-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269485#comment-13269485
 ] 

Jan Høydahl commented on SOLR-3439:
---

bq. That said, I am a little reluctant to change the overall pattern/approach 
simply to add one field. Maybe the pattern change should be a separate issue.
SOLR-3442

> Add "content" field to example schema to make SolrCell easier to use out of 
> the box
> ---
>
> Key: SOLR-3439
> URL: https://issues.apache.org/jira/browse/SOLR-3439
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction), Schema and 
> Analysis
>Reporter: Jack Krupansky
>Priority: Minor
> Fix For: 4.0
>
> Attachments: Lincoln-Gettysburg-Address.docx, 
> Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a 
> document) to the "text" field which is the indexed-only (not stored) 
> catch-all for default queries. That searches fine, but doesn't show the 
> document content in the results, sometimes leading users to think that 
> something is wrong. Sure, the user can easily add the field (and this is 
> documented), but it would be a better user experience to have such a basic 
> feature work right out of the box without any config editing and without the 
> need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the 
> section of fields already defined to support SolrCell metadata. It would be 
> stored and indexed.
> I further propose that a copyField be added for the "title", "description", 
> (and maybe a couple of others) and "content" fields to add them to the "text" 
> field for searching. Again, trying to improve the out of the box user 
> experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269509#comment-13269509
 ] 

Jan Høydahl commented on SOLR-3442:
---

Sure, I've seen it successfully used too, and I use it myself now and then to 
reduce the number of fields required in "qf".

For very small indexes without much need for tuning analysis or relevancy it 
does not matter very much. But I'm arguing that copyField is the legacy way of 
searching multiple fields in one go, while DisMax is the current 
recommendation. So why stick to the legacy in the default example?

> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269603#comment-13269603
 ] 

Jan Høydahl commented on SOLR-3442:
---

I'm not saying anything is "dead". Both the "lucene" queryparser and copyField 
has its mission and is supported, and you can mix and match these with DisMax 
to fit your needs. But for the example we should select the most useful and 
flexible way to show indexing and search, and that is no longer "text" 
catch-all and copyField. Aside from it doubling the size of your index, it is 
inflexible in that adding or removing a field from search means schema update 
and re-indexing. Catch-all fields with copyField can sometimes be used as a 
performance optimization, but you do not start in that end.

Maintaining many examples has shown not to be a very good strategy, look at the 
multi-core and DIH examples, they lag behind several versions when it comes to 
schema version and new solrconfig syntaxes. Instead, a single schema which can 
do both the product search and document search use cases well is easy to 
achieve. The Velocity GUI can be extended with two tabs if need be, one 
"products" tab and one "documents" tab. If we choose the example documents to 
index wisely, to be i.e. user guides for the products, we get a nice 
connection. You can search for "ipod" and see both products and user guides 
matching your search. 

> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-05-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272293#comment-13272293
 ] 

Jan Høydahl commented on SOLR-3377:
---

I think we need better test coverage before this is ready.
We should add a bunch of tests with queries involving parens, to verify that 
they behave as expected. Both tests involving parens as intended grouping for 
boolean precedence as well as parens not intended as boolean sugar but as plain 
text pasted from somewhere:

{noformat}
q=(foo OR title:bar) AND (title:foo OR bar)
q=Meeting at noon (room:Auditorium)
{noformat}

The first should obey the instructed boolean order, while the last should 
return docs with the literal token "room:Autirium" in any of the qf fields.

The key goal of dismax is to be very robust so people can paste in all kind of 
garbage, and get matches. So if the query parses as valid boolean logic, that 
should be used.

> eDismax: A fielded query wrapped by parens is not recognized
> 
>
> Key: SOLR-3377
> URL: https://issues.apache.org/jira/browse/SOLR-3377
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 3.6
>Reporter: Jan Høydahl
>Assignee: Bernd Fehling
> Fix For: 4.0, 3.6.1
>
> Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
> SOLR-3377.patch
>
>
> As reported by "bernd" on the user list, a query like this
> {{q=(name:test)}}
> will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3448) Date math in range queries does not handle plus sign

2012-05-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272856#comment-13272856
 ] 

Jan Høydahl commented on SOLR-3448:
---

You're getting bit by URL encoding. Try %2B instead of + :-)

{noformat}
facet.query=timestamp:[NOW-1YEAR/DAY%20TO%20NOW/DAY%2B1DAY]
{noformat}


> Date math in range queries does not handle plus sign
> 
>
> Key: SOLR-3448
> URL: https://issues.apache.org/jira/browse/SOLR-3448
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: Lance Norskog
>
> This query:
> {code}
> facet.query=timestamp:[NOW-1YEAR/DAY%20TO%20NOW/DAY+1DAY]
> {code}
> gives this error:
> {code}
> Cannot parse '[NOW-1YEAR/DAY TO NOW/DAY 1DAY]': Encountered " 
>  "1DAY "" at line 1, column 26.
> Was expecting one of:
> "]" ...
> "}" ...
> {code}
> Should the fix be to add a backslash in front of +1DAY? That does not work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3450) CoreAdminHandler.handleStatusAction

2012-05-11 Thread JIRA
Trym Møller created SOLR-3450:
-

 Summary: CoreAdminHandler.handleStatusAction
 Key: SOLR-3450
 URL: https://issues.apache.org/jira/browse/SOLR-3450
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Linux version 2.6.32-29-server (buildd@allspice) (gcc 
version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 
2011

Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
Reporter: Trym Møller
Priority: Minor


May 8, 2012 12:49:49 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error handling 'status' action 
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:551)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:161)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:360)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:173)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.IllegalArgumentException: 
/usr/lib/solr-4.0/example/dataDir/index.20120419210203/_kvon_0.frq does not 
exist
at org.apache.commons.io.FileUtils.sizeOf(FileUtils.java:2053)
at org.apache.commons.io.FileUtils.sizeOfDirectory(FileUtils.java:2089)
at 
org.apache.solr.handler.admin.CoreAdminHandler.getIndexSize(CoreAdminHandler.java:837)
at 
org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:822)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:542)
... 21 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

2012-05-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277085#comment-13277085
 ] 

Tomás Fernández Löbbe commented on SOLR-3455:
-

This doesn't look like a bug from the description, and I don't understand the 
summary, you are not using WordDelimiterFilterFactory in that field type. Your 
test searches seem to be giving the correct results.

> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> 
>
> Key: SOLR-3455
> URL: https://issues.apache.org/jira/browse/SOLR-3455
> Project: Solr
>  Issue Type: Bug
>Reporter: phatak.prachi
>Priority: Blocker
>
> • RET-34333
> • WAT-34333
> • RET 3
> • 34333
> When I search for RET => RET-34333, RET 3
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> 
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>  maxGramSize="15" side="front"/>
> 
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  maxGramSize="15" side="front"/>
>  words="stopwords.txt"  enablePositionIncrements="true"  />
> 
> 
>
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3453) edismax lowercaseOperators=false broken by SOLR-3026

2012-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-3453:


Attachment: SOLR-3026.patch

This patch includes the fix and a test case.

> edismax lowercaseOperators=false broken by SOLR-3026
> 
>
> Key: SOLR-3453
> URL: https://issues.apache.org/jira/browse/SOLR-3453
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6
>Reporter: Michael Ryan
> Attachments: SOLR-3026.patch
>
>
> The edismax lowercaseOperators=false option seems to have been broken by 
> SOLR-3026. "foo and bar" and "foo or bar" are treated as "foo AND bar" and 
> "foo OR bar", respectively, even when lowercaseOperators=false.
> Fix is rather simple, I think (though I haven't tested this). Current code:
> {noformat}if (i>0 && i+1   if ("AND".equalsIgnoreCase(s)) {
> s="AND";
>   } else if ("OR".equalsIgnoreCase(s)) {
> s="OR";
>   }
> }{noformat}
> Proposed code:
> {noformat}if (lowercaseOperators) {
>   if (i>0 && i+1 if ("AND".equalsIgnoreCase(s)) {
>   s="AND";
> } else if ("OR".equalsIgnoreCase(s)) {
>   s="OR";
> }
>   }
> }{noformat}
> Also interesting is the treatment of "Or" and "oR", but I'll leave that as an 
> exercise to the reader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3453) edismax lowercaseOperators=false broken by SOLR-3026

2012-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-3453:


Attachment: SOLR-3026.patch

Improved the test case a bit. The patch is for trunk.

> edismax lowercaseOperators=false broken by SOLR-3026
> 
>
> Key: SOLR-3453
> URL: https://issues.apache.org/jira/browse/SOLR-3453
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6
>Reporter: Michael Ryan
> Attachments: SOLR-3026.patch, SOLR-3026.patch
>
>
> The edismax lowercaseOperators=false option seems to have been broken by 
> SOLR-3026. "foo and bar" and "foo or bar" are treated as "foo AND bar" and 
> "foo OR bar", respectively, even when lowercaseOperators=false.
> Fix is rather simple, I think (though I haven't tested this). Current code:
> {noformat}if (i>0 && i+1   if ("AND".equalsIgnoreCase(s)) {
> s="AND";
>   } else if ("OR".equalsIgnoreCase(s)) {
> s="OR";
>   }
> }{noformat}
> Proposed code:
> {noformat}if (lowercaseOperators) {
>   if (i>0 && i+1 if ("AND".equalsIgnoreCase(s)) {
>   s="AND";
> } else if ("OR".equalsIgnoreCase(s)) {
>   s="OR";
> }
>   }
> }{noformat}
> Also interesting is the treatment of "Or" and "oR", but I'll leave that as an 
> exercise to the reader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

2012-05-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277847#comment-13277847
 ] 

Tomás Fernández Löbbe commented on SOLR-3455:
-

You are right, I don't see why that search matches RET-34333 and WAT-34333 with 
your field type. The field type that you provided doesn't use the the 
WordDelimiterFilterFactory though, Have you pasted the correct one? Also, have 
you seen the other configuration attributes, like "generateNumberParts" and 
"splitOnNumerics"? This may be a configuration problem and not a bug, probably 
you would get more help on the users list?

> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> 
>
> Key: SOLR-3455
> URL: https://issues.apache.org/jira/browse/SOLR-3455
> Project: Solr
>  Issue Type: Bug
>Reporter: phatak.prachi
>Priority: Blocker
>
> • RET-34333
> • WAT-34333
> • RET 3
> • 34333
> When I search for RET => RET-34333, RET 3
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> 
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>  maxGramSize="15" side="front"/>
> 
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  maxGramSize="15" side="front"/>
>  words="stopwords.txt"  enablePositionIncrements="true"  />
> 
>     
>
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2058) Adds optional "phrase slop" to edismax "pf2", "pf3" and "pf" parameters with field~slop^boost syntax

2012-05-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-2058:
-

Assignee: (was: Jan Høydahl)

James, you may grab it. I have a half-baked patch, but will rather take a stab 
later if there is anything to improve after your commit.

> Adds optional "phrase slop" to edismax "pf2", "pf3" and "pf" parameters with 
> field~slop^boost syntax
> 
>
> Key: SOLR-2058
> URL: https://issues.apache.org/jira/browse/SOLR-2058
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
> Environment: n/a
>Reporter: Ron Mayer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2058.patch, edismax_pf_with_slop_v2.1.patch, 
> edismax_pf_with_slop_v2.patch, pf2_with_slop.patch
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c4c659119.2010...@0ape.com%3E
> {quote}
> From  Ron Mayer 
> ... my results might  be even better if I had a couple different "pf2"s with 
> different "ps"'s  at the same time.   In particular.   One with ps=0 to put a 
> high boost on ones the have  the right ordering of words.  For example 
> insuring that [the query]:
>   "red hat black jacket"
>  boosts only documents with "red hats" and not "black hats".   And another 
> pf2 with a more modest boost with ps=5 or so to handle the query above also 
> boosting docs with 
>   "red baseball hat".
> {quote}
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3caanlktimd+v3g6d_mnhp+jykkd+dej8fvmvf_1lqoi...@mail.gmail.com%3E]
> {quote}
> From  Yonik Seeley 
> Perhaps fold it into the pf/pf2 syntax?
> pf=text^2// current syntax... makes phrases with a boost of 2
> pf=text~1^2  // proposed syntax... makes phrases with a slop of 1 and
> a boost of 2
> That actually seems pretty natural given the lucene query syntax - an
> actual boosted sloppy phrase query already looks like
> {{text:"foo bar"~1^2}}
> -Yonik
> {quote}
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3calpine.deb.1.10.1008161300510.6...@radix.cryptio.net%3E]
> {quote}
> From  Chris Hostetter 
> Big +1 to this idea ... the existing "ps" param can stick arround as the 
> default for any field that doesn't specify it's own slop in the pf/pf2/pf3 
> fields using the "~" syntax.
> -Hoss
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2844) SolrJ: Make DocmentObjectBinder accept getter only fields (adapter pattern)

2012-05-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarosław Bojar updated SOLR-2844:
-

Attachment: SOLR-2844.patch

Another patch for @Field annotations on getters. This version also finds 
matching setter method and sets DocField.setter field correctly.

> SolrJ: Make DocmentObjectBinder accept getter only fields (adapter pattern)
> ---
>
> Key: SOLR-2844
> URL: https://issues.apache.org/jira/browse/SOLR-2844
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: Jens Wike
>Priority: Minor
> Fix For: 4.1
>
> Attachments: SOLR-2844.patch, SOLR-2844.patch
>
>
> Our primary use case for SolrJ is to feed data into solr. We commonly use an 
> adapter design pattern in our presentation or export layer in the 
> application. E.g. an adapter to flatten structured relational data for books 
> for an solr import might look like this:
> class Book {
>   BookEntity book;
>   public String getTitle() { return book.getTitle(); }
>   public String getAuthorName() { return book.getAuthor().getName(); }
>   public Double getMinimumPrice( { return 
> priceService.calculateMinimumPrice(book); }
> }
> This is not working currently, because a setter has to be specified. So the 
> workaround is to write this code:
> class Book {
>   BookEntity book;
>   public String getTitle() { return book.getTitle(); }
>   @Field public void setTitle(String s) { }
>   public String getAuthorName() { return book.getAuthor().getName(); }
>   @Field public void setAutherName(String s) { }
>   public Double getMinimumPrice( { return 
> priceService.calculateMinimumPrice(book); }
>   @Field public void setMinimumPrice(Double d) { }
> }
> So the scope of this improvement is to get rid of the dummy setters and to 
> support @Field on getters directly.
> I will add a patch proposal for this later on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2058) Adds optional "phrase slop" to edismax "pf2", "pf3" and "pf" parameters with field~slop^boost syntax

2012-05-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281004#comment-13281004
 ] 

Jan Høydahl commented on SOLR-2058:
---

I did a lot of things in the same patch - including ps2, ps3 support and 
refactoring of FieldSpec parsing to a separate class, and adding test cases for 
boosting. But there is wrapping up to do and I don't know if I'm 100% happy 
with using RegEx for parsing fieldspec. I'll attach what I have, but as I say I 
think it is better to add some of these improvements incrementally.

> Adds optional "phrase slop" to edismax "pf2", "pf3" and "pf" parameters with 
> field~slop^boost syntax
> 
>
>     Key: SOLR-2058
> URL: https://issues.apache.org/jira/browse/SOLR-2058
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
> Environment: n/a
>Reporter: Ron Mayer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2058.patch, edismax_pf_with_slop_v2.1.patch, 
> edismax_pf_with_slop_v2.patch, pf2_with_slop.patch
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c4c659119.2010...@0ape.com%3E
> {quote}
> From  Ron Mayer 
> ... my results might  be even better if I had a couple different "pf2"s with 
> different "ps"'s  at the same time.   In particular.   One with ps=0 to put a 
> high boost on ones the have  the right ordering of words.  For example 
> insuring that [the query]:
>   "red hat black jacket"
>  boosts only documents with "red hats" and not "black hats".   And another 
> pf2 with a more modest boost with ps=5 or so to handle the query above also 
> boosting docs with 
>   "red baseball hat".
> {quote}
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3caanlktimd+v3g6d_mnhp+jykkd+dej8fvmvf_1lqoi...@mail.gmail.com%3E]
> {quote}
> From  Yonik Seeley 
> Perhaps fold it into the pf/pf2 syntax?
> pf=text^2// current syntax... makes phrases with a boost of 2
> pf=text~1^2  // proposed syntax... makes phrases with a slop of 1 and
> a boost of 2
> That actually seems pretty natural given the lucene query syntax - an
> actual boosted sloppy phrase query already looks like
> {{text:"foo bar"~1^2}}
> -Yonik
> {quote}
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3calpine.deb.1.10.1008161300510.6...@radix.cryptio.net%3E]
> {quote}
> From  Chris Hostetter 
> Big +1 to this idea ... the existing "ps" param can stick arround as the 
> default for any field that doesn't specify it's own slop in the pf/pf2/pf3 
> fields using the "~" syntax.
> -Hoss
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2058) Adds optional "phrase slop" to edismax "pf2", "pf3" and "pf" parameters with field~slop^boost syntax

2012-05-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2058:
--

Attachment: SOLR-2058-and-3351-not-finished.patch

Attaching a patch doing also some ps2/ps3 stuff, adding more tests etc, but 
it's not finished. Unfortunately it's big partly due to whitespace differences

> Adds optional "phrase slop" to edismax "pf2", "pf3" and "pf" parameters with 
> field~slop^boost syntax
> 
>
> Key: SOLR-2058
> URL: https://issues.apache.org/jira/browse/SOLR-2058
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
> Environment: n/a
>Reporter: Ron Mayer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2058-and-3351-not-finished.patch, SOLR-2058.patch, 
> edismax_pf_with_slop_v2.1.patch, edismax_pf_with_slop_v2.patch, 
> pf2_with_slop.patch
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c4c659119.2010...@0ape.com%3E
> {quote}
> From  Ron Mayer 
> ... my results might  be even better if I had a couple different "pf2"s with 
> different "ps"'s  at the same time.   In particular.   One with ps=0 to put a 
> high boost on ones the have  the right ordering of words.  For example 
> insuring that [the query]:
>   "red hat black jacket"
>  boosts only documents with "red hats" and not "black hats".   And another 
> pf2 with a more modest boost with ps=5 or so to handle the query above also 
> boosting docs with 
>   "red baseball hat".
> {quote}
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3caanlktimd+v3g6d_mnhp+jykkd+dej8fvmvf_1lqoi...@mail.gmail.com%3E]
> {quote}
> From  Yonik Seeley 
> Perhaps fold it into the pf/pf2 syntax?
> pf=text^2// current syntax... makes phrases with a boost of 2
> pf=text~1^2  // proposed syntax... makes phrases with a slop of 1 and
> a boost of 2
> That actually seems pretty natural given the lucene query syntax - an
> actual boosted sloppy phrase query already looks like
> {{text:"foo bar"~1^2}}
> -Yonik
> {quote}
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3calpine.deb.1.10.1008161300510.6...@radix.cryptio.net%3E]
> {quote}
> From  Chris Hostetter 
> Big +1 to this idea ... the existing "ps" param can stick arround as the 
> default for any field that doesn't specify it's own slop in the pf/pf2/pf3 
> fields using the "~" syntax.
> -Hoss
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1833) ShowFileRequestHandle should allow you to specify which files can be viewed, not just which cannot be viewed

2012-05-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raphaël Droz updated SOLR-1833:
---

Attachment: SOLR-1833-ShowFileRequestHandler-whitelist.patch

This patch may do the trick.
Still unsure about invariants.getParams() return value (could it be an empty 
Set ?).

Whitelisting takes precedence thus overrules any "hidden" invariant. I think 
this is a relatively safe behavior.


> ShowFileRequestHandle should allow you to specify which files can be viewed, 
> not just which cannot be viewed
> 
>
> Key: SOLR-1833
> URL: https://issues.apache.org/jira/browse/SOLR-1833
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Mark Miller
>Priority: Minor
> Fix For: 4.1
>
> Attachments: SOLR-1833-ShowFileRequestHandler-whitelist.patch
>
>
> In many cases I wouldn't want to come up with every file I want hidden - 
> especially when new files may be added in the future - often you would want 
> to explicitly say which files can be viewed - this is how the old 
> gettableFiles used to work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3485) Make /browse (files and handlers) dependencies self URL-contained

2012-05-24 Thread JIRA
Raphaël Droz created SOLR-3485:
--

 Summary: Make /browse (files and handlers) dependencies self 
URL-contained
 Key: SOLR-3485
 URL: https://issues.apache.org/jira/browse/SOLR-3485
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0
Reporter: Raphaël Droz
Priority: Minor


Assuming that /browse may be, now or later, safe for a public use it would be 
very useful to make it "self-contained" in a given URL pattern in order to 
allow URL-based access restrictions.

There are 3 issues here :
* static files (css/js/img)
* external handlers like /terms, /clustering
* pattern switch between /browse/* and /collection1/browse/*

I only try to address the 1st issue, in the comment below.
If both /terms and /clustering are safe to be public, then issue 2 may be 
omitted.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3485) Make /browse (files and handlers) dependencies self URL-contained

2012-05-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raphaël Droz updated SOLR-3485:
---

Attachment: SOLR-3485-browse-static-files-URL-1.patch

patch affects the example configuration :
* changes the location of expected for jquery.autocomplete.* and main.css
* creates the corresponding /browse/file solr.admin.ShowFileRequestHandler.

It makes use of the patch provided in issue #SOLR-1833 in order to provide 
access to the restricted set of files absolutely needed and explicitly allowed.

> Make /browse (files and handlers) dependencies self URL-contained
> -
>
> Key: SOLR-3485
> URL: https://issues.apache.org/jira/browse/SOLR-3485
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Raphaël Droz
>Priority: Minor
> Attachments: SOLR-3485-browse-static-files-URL-1.patch
>
>
> Assuming that /browse may be, now or later, safe for a public use it would be 
> very useful to make it "self-contained" in a given URL pattern in order to 
> allow URL-based access restrictions.
> There are 3 issues here :
> * static files (css/js/img)
> * external handlers like /terms, /clustering
> * pattern switch between /browse/* and /collection1/browse/*
> I only try to address the 1st issue, in the comment below.
> If both /terms and /clustering are safe to be public, then issue 2 may be 
> omitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3485) Make /browse (files and handlers) dependencies self URL-contained

2012-05-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raphaël Droz updated SOLR-3485:
---

Comment: was deleted

(was: Not really a blocker but whitelisting allowed files is probably the 
preferred way.)

> Make /browse (files and handlers) dependencies self URL-contained
> -
>
> Key: SOLR-3485
> URL: https://issues.apache.org/jira/browse/SOLR-3485
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Raphaël Droz
>Priority: Minor
> Attachments: SOLR-3485-browse-static-files-URL-1.patch
>
>
> Assuming that /browse may be, now or later, safe for a public use it would be 
> very useful to make it "self-contained" in a given URL pattern in order to 
> allow URL-based access restrictions.
> There are 3 issues here :
> * static files (css/js/img)
> * external handlers like /terms, /clustering
> * pattern switch between /browse/* and /collection1/browse/*
> I only try to address the 1st issue, in the comment below.
> If both /terms and /clustering are safe to be public, then issue 2 may be 
> omitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3485) Make /browse (files and handlers) dependencies self URL-contained

2012-05-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282500#comment-13282500
 ] 

Raphaël Droz edited comment on SOLR-3485 at 5/24/12 1:34 PM:
-

patch affects the example configuration :
* changes the location of expected for jquery.autocomplete.* and main.css
* creates the corresponding /browse/file solr.admin.ShowFileRequestHandler.

It makes use of the patch provided in issue SOLR-1833 in order to provide 
access to the restricted set of files absolutely needed and explicitly allowed.

  was (Author: drzraf):
patch affects the example configuration :
* changes the location of expected for jquery.autocomplete.* and main.css
* creates the corresponding /browse/file solr.admin.ShowFileRequestHandler.

It makes use of the patch provided in issue #SOLR-1833 in order to provide 
access to the restricted set of files absolutely needed and explicitly allowed.
  
> Make /browse (files and handlers) dependencies self URL-contained
> -
>
> Key: SOLR-3485
> URL: https://issues.apache.org/jira/browse/SOLR-3485
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Raphaël Droz
>Priority: Minor
> Attachments: SOLR-3485-browse-static-files-URL-1.patch
>
>
> Assuming that /browse may be, now or later, safe for a public use it would be 
> very useful to make it "self-contained" in a given URL pattern in order to 
> allow URL-based access restrictions.
> There are 3 issues here :
> * static files (css/js/img)
> * external handlers like /terms, /clustering
> * pattern switch between /browse/* and /collection1/browse/*
> I only try to address the 1st issue, in the comment below.
> If both /terms and /clustering are safe to be public, then issue 2 may be 
> omitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-2566) + - operators allow any amount of whitespace

2012-05-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reopened LUCENE-2566:
-

  Assignee: Jan Høydahl

Re-opening for backport

> + - operators allow any amount of whitespace
> 
>
> Key: LUCENE-2566
> URL: https://issues.apache.org/jira/browse/LUCENE-2566
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Reporter: Yonik Seeley
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2566.patch
>
>
> As an example, (foo - bar) is treated like (foo -bar).
> It seems like for +- to be treated as unary operators, they should be 
> immediately followed by the operand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2566) + - operators allow any amount of whitespace

2012-05-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated LUCENE-2566:


Attachment: LUCENE-2566-3x.patch

Backport to 3.6 branch. All tests pass. Committing soon.

> + - operators allow any amount of whitespace
> 
>
> Key: LUCENE-2566
> URL: https://issues.apache.org/jira/browse/LUCENE-2566
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 3.6
>Reporter: Yonik Seeley
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0, 3.6.1
>
> Attachments: LUCENE-2566-3x.patch, LUCENE-2566.patch
>
>
> As an example, (foo - bar) is treated like (foo -bar).
> It seems like for +- to be treated as unary operators, they should be 
> immediately followed by the operand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2566) + - operators allow any amount of whitespace

2012-05-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated LUCENE-2566:


Affects Version/s: 3.6
Fix Version/s: 3.6.1

> + - operators allow any amount of whitespace
> 
>
> Key: LUCENE-2566
> URL: https://issues.apache.org/jira/browse/LUCENE-2566
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 3.6
>Reporter: Yonik Seeley
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0, 3.6.1
>
> Attachments: LUCENE-2566-3x.patch, LUCENE-2566.patch
>
>
> As an example, (foo - bar) is treated like (foo -bar).
> It seems like for +- to be treated as unary operators, they should be 
> immediately followed by the operand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4074) FST Sorter BufferSize causes int overflow if BufferSize > 2048MB

2012-05-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283303#comment-13283303
 ] 

Jan Høydahl commented on LUCENE-4074:
-

Checked in a fix in 3.6 for non-compiling TestSort.testRamBuffer. It referred 
to random().nextInt() instead of random.nextInt() - clear copy/paste error from 
Trunk code

> FST Sorter BufferSize causes int overflow if BufferSize > 2048MB
> 
>
> Key: LUCENE-4074
> URL: https://issues.apache.org/jira/browse/LUCENE-4074
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/spellchecker
>Affects Versions: 3.6, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0, 3.6.1
>
> Attachments: LUCENE-4074.patch
>
>
> the BufferSize constructor accepts size in MB as an integer and uses 
> multiplication to convert to bytes. While its checking the size in bytes to 
> be less than 2048 MB it does that after byte conversion. If you pass a value 
> > 2047 to the ctor the value overflows since all constants and methods based 
> on MB expect 32 bit signed ints. This does not even result in an exception 
> until the BufferSize is actually passed to the sorter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2566) + - operators allow any amount of whitespace

2012-05-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved LUCENE-2566.
-

Resolution: Fixed

Checked in for 3.6.1

> + - operators allow any amount of whitespace
> 
>
> Key: LUCENE-2566
> URL: https://issues.apache.org/jira/browse/LUCENE-2566
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 3.6
>Reporter: Yonik Seeley
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0, 3.6.1
>
> Attachments: LUCENE-2566-3x.patch, LUCENE-2566.patch
>
>
> As an example, (foo - bar) is treated like (foo -bar).
> It seems like for +- to be treated as unary operators, they should be 
> immediately followed by the operand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3489) Config file replication less error prone

2012-05-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-3489.
---

Resolution: Fixed

Thanks for reporting. You patch (which is identical with the trunk code) is 
committed to branch 3_6

> Config file replication less error prone
> 
>
> Key: SOLR-3489
> URL: https://issues.apache.org/jira/browse/SOLR-3489
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 3.6
>Reporter: Jochen Just
>Assignee: Jan Høydahl
>Priority: Minor
> Attachments: SOLR-3489.patch, SOLR-3489_reproducing_config.tar.gz
>
>
> If the listing of configuration files that should be replicated contains a 
> space, the following file is not replicated.
> Example:
> {code:xml}
> 
> schema.xml,test.txt, stopwords.txt
> {code}
> It would be nice, if that space simply would be ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-3489) Config file replication less error prone

2012-05-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-3489:
-

Assignee: Jan Høydahl

> Config file replication less error prone
> 
>
> Key: SOLR-3489
> URL: https://issues.apache.org/jira/browse/SOLR-3489
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 3.6
>Reporter: Jochen Just
>Assignee: Jan Høydahl
>Priority: Minor
> Attachments: SOLR-3489.patch, SOLR-3489_reproducing_config.tar.gz
>
>
> If the listing of configuration files that should be replicated contains a 
> space, the following file is not replicated.
> Example:
> {code:xml}
> 
> schema.xml,test.txt, stopwords.txt
> {code}
> It would be nice, if that space simply would be ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3351) eDismax: ps2 and ps3 params

2012-05-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3351:
--

Attachment: SOLR-3351.patch

Patch adding ps2 and ps3 params, including test cases

> eDismax: ps2 and ps3 params
> ---
>
> Key: SOLR-3351
> URL: https://issues.apache.org/jira/browse/SOLR-3351
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0
>
> Attachments: SOLR-3351.patch
>
>
> Add support for custom Phrase Slop for "pf2" and "pf3" of edismax. If not 
> specified, it should use "ps" as today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-3489) Config file replication less error prone

2012-05-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reopened SOLR-3489:
---


> Config file replication less error prone
> 
>
> Key: SOLR-3489
> URL: https://issues.apache.org/jira/browse/SOLR-3489
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 3.6
>Reporter: Jochen Just
>Assignee: Jan Høydahl
>Priority: Minor
> Attachments: SOLR-3489.patch, SOLR-3489_reproducing_config.tar.gz
>
>
> If the listing of configuration files that should be replicated contains a 
> space, the following file is not replicated.
> Example:
> {code:xml}
> 
> schema.xml,test.txt, stopwords.txt
> {code}
> It would be nice, if that space simply would be ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3489) Config file replication less error prone

2012-05-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3489:
--

Fix Version/s: 3.6.1
   4.0

> Config file replication less error prone
> 
>
> Key: SOLR-3489
> URL: https://issues.apache.org/jira/browse/SOLR-3489
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 3.6
>Reporter: Jochen Just
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0, 3.6.1
>
> Attachments: SOLR-3489.patch, SOLR-3489_reproducing_config.tar.gz
>
>
> If the listing of configuration files that should be replicated contains a 
> space, the following file is not replicated.
> Example:
> {code:xml}
> 
> schema.xml,test.txt, stopwords.txt
> {code}
> It would be nice, if that space simply would be ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3489) Config file replication less error prone

2012-05-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3489:
--

Attachment: SOLR-3489.patch

New smaller patch, for trunk

> Config file replication less error prone
> 
>
> Key: SOLR-3489
> URL: https://issues.apache.org/jira/browse/SOLR-3489
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 3.6
>Reporter: Jochen Just
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0, 3.6.1
>
> Attachments: SOLR-3489.patch, SOLR-3489.patch, 
> SOLR-3489_reproducing_config.tar.gz
>
>
> If the listing of configuration files that should be replicated contains a 
> space, the following file is not replicated.
> Example:
> {code:xml}
> 
> schema.xml,test.txt, stopwords.txt
> {code}
> It would be nice, if that space simply would be ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3489) Config file replication less error prone

2012-05-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-3489.
---

Resolution: Fixed

You were right Jochen - my bad. I simplified the patch a bit and committed to 
both trunk and branch.

> Config file replication less error prone
> 
>
> Key: SOLR-3489
> URL: https://issues.apache.org/jira/browse/SOLR-3489
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 3.6
>Reporter: Jochen Just
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0, 3.6.1
>
> Attachments: SOLR-3489.patch, SOLR-3489.patch, 
> SOLR-3489_reproducing_config.tar.gz
>
>
> If the listing of configuration files that should be replicated contains a 
> space, the following file is not replicated.
> Example:
> {code:xml}
> 
> schema.xml,test.txt, stopwords.txt
> {code}
> It would be nice, if that space simply would be ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3477) SOLR does not start up when no cores are defined

2012-06-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-3477:


Attachment: SOLR-3477-3_6.patch
SOLR-3477.patch

I checked this and saw the same as Tommaso, it seems to work on trunk and 4x 
branch. 
I added a test case that start the CoreContainer with no cores (a solr.xml file 
with an empty list of cores). It works on trunk and fails on 3.6 with an 
exception as the one described in the description of this issue.

> SOLR does not start up when no cores are defined
> 
>
> Key: SOLR-3477
> URL: https://issues.apache.org/jira/browse/SOLR-3477
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6
> Environment: All environments
>Reporter: Sebastian Schaffert
>Priority: Critical
> Attachments: SOLR-3477-3_6.patch, SOLR-3477.patch
>
>
> Since version 3.6.0, Solr does not start up when no cores are defined in 
> solr.xml. The problematic code is in CoresContainer.java, lines 171-173.
> org.apache.solr.common.SolrException: No cores were created, please check the 
> logs for errors
>   at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:172)
>  ~[solr-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:34:38]
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) 
> ~[solr-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:34:38]
> ...
> In our case, this is however a valid situation, because we create the cores 
> programatically by calling the webservices to register new cores. The server 
> is initially started with no cores defined, and depending on the 
> configuration of our application, cores are then created dynamically.
> For the time being, we have to stick with version 3.5, which did not have 
> this problem (or feature).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1680) Provide an API to specify custom Collectors

2012-06-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289762#comment-13289762
 ] 

Tomás Fernández Löbbe commented on SOLR-1680:
-

Why not use a Factory that could be changed from the solrconfig.xml file?

> Provide an API to specify custom Collectors
> ---
>
> Key: SOLR-1680
> URL: https://issues.apache.org/jira/browse/SOLR-1680
> Project: Solr
>  Issue Type: Sub-task
>  Components: search
>Affects Versions: 1.3
>Reporter: Martijn van Groningen
> Fix For: 4.1
>
> Attachments: SOLR-1680.patch, field-collapse-core.patch
>
>
> The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
> core code. 
> We want to make it possible for components to specify custom Collectors in 
> SolrIndexSearcher methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3854) Non-tokenized fields become tokenized when a document is deleted and added back

2012-06-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293446#comment-13293446
 ] 

András Péteri commented on LUCENE-3854:
---

Isn't this considered a regression from 3.x? In 3.6.0 I'm seeing an additional 
byte being read from the stream in FieldsReader, which contained bits that 
allowed the reader to reconstruct the Index enum correctly for the field. This 
should make it possible to properly update a document in which all fields were 
stored, with the exception of boost values (and they could be stored 
redundantly in a field as well to overcome this limitation).

> Non-tokenized fields become tokenized when a document is deleted and added 
> back
> ---
>
> Key: LUCENE-3854
> URL: https://issues.apache.org/jira/browse/LUCENE-3854
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Benson Margulies
>
> https://github.com/bimargulies/lucene-4-update-case is a JUnit test case that 
> seems to show a problem with the current trunk. It creates a document with a 
> Field typed as StringField.TYPE_STORED and a value with a "-" in it. A 
> TermQuery can find the value, initially, since the field is not tokenized.
> Then, the case reads the Document back out through a reader. In the copy of 
> the Document that gets read out, the Field now has the tokenized bit turned 
> on. 
> Next, the case deletes and adds the Document. The 'tokenized' bit is 
> respected, so now the field gets tokenized, and the result is that the query 
> on the term with the - in it no longer works.
> So I think that the defect here is in the code that reconstructs the Document 
> when read from the index, and which turns on the tokenized bit.
> I have an ICLA on file so you can take this code from github, but if you 
> prefer I can also attach it here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3544) Under heavy load json response is cut at some arbitrary position

2012-06-14 Thread JIRA
Dušan Omerčević created SOLR-3544:
-

 Summary: Under heavy load json response is cut at some arbitrary 
position
 Key: SOLR-3544
 URL: https://issues.apache.org/jira/browse/SOLR-3544
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
 Environment: Linux version 2.6.32-5-amd64 (Debian 2.6.32-38) 
(b...@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) )
Reporter: Dušan Omerčević


We query solr for 30K documents using json as the response format. Normally 
this works perfectly fine. But when the machine comes under heavy load (all 
cores utilized) the response got interrupted at arbitrary position. We 
circumvented the problem by switching to xml response format.

I've written the full description here: 
http://restreaming.wordpress.com/2012/06/14/the-curious-case-of-solr-malfunction/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3416) HTTP ERROR 400. Problem accessing /solr/select/. Reason: undefined field text

2012-06-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295693#comment-13295693
 ] 

Jörg von Frantzius commented on SOLR-3416:
--

There is a query for warming up some in cache in the default solrconfig.xml, 
which makes use of the default search field.

I had set the "defaultSearchField" in the schema, but that didn't seem to get 
picked up.

So I changed the query for {{ HTTP ERROR 400.   Problem accessing /solr/select/.   Reason: undefined field 
> text
> -
>
> Key: SOLR-3416
>     URL: https://issues.apache.org/jira/browse/SOLR-3416
> Project: Solr
>  Issue Type: Bug
> Environment: Fedora 13 (Goddard)
>Reporter: uday shankar singh
>  Labels: apche, jetty, nutch, solr
>
> I've got a Solr instance running on my Ubuntu machine using the default Jetty 
> server that the Solr download comes with. Whenever I start Solr using 
> java -jar start.jar 
> The server starts fine but there is always an exception thrown: 
> INFO: SolrUpdateServlet.init() done 
> 2012-04-26 11:36:59.630:INFO::Started SocketConnector@0.0.0.0:8983 
> Apr 26, 2012 11:37:14 AM org.apache.solr.common.SolrException log 
> SEVERE: org.apache.solr.common.SolrException: undefined field text 
> As I said though, the server will still start and I can see the Solr admin 
> interface. I defined my schema as follows. 
>  
>  
>  
>  
>  
> id 
> When I attempt to run a query USING Admin interface, 
> the defualt query i.e *.* or from the url using: 
> http://localhost:8983/solr/select/?q=*:*&version=2.2&start=0&rows=10&indent=on
>  
> It correctly returns all the data that I crawled using Nutch 
> However, the moment I try to query using text in the admin interface or 
> through the url I receive an HTTP ERROR 404. 
> url: 
> http://localhost:8983/solr/select/?q=fruit&version=2.2&start=0&rows=10&indent=on
>  
> --- returns --- 
> HTTP ERROR 400 
> Problem accessing /solr/select/. Reason: 
> undefined field text 
> Powered by Jetty:// 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3544) Under heavy load json response is cut at some arbitrary position

2012-06-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295913#comment-13295913
 ] 

Dušan Omerčević commented on SOLR-3544:
---

Ad 1) We're using Tomcat.

Ad 2) We fetch up to 40K documents with each document being a few KB in size. 
Total response is some 100MB in size. The cut off really happened at arbitrary 
position, from a few KB into the response up to several MB into the response.

Ad 3) I've checked it thoroughly and I couldn't find any pattern. It really 
seems that the problem is independent of the data.

Ad 4) The full message is:
2012-06-11 12:23:34 ERROR: ClientAbortException:  java.net.SocketException: 
Broken pipe
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:370)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:323)
at 
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:396)
at 
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:385)
at 
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:190)
at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:113)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at 
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:750)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:432)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:347)
at 
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:773)
at 
org.apache.coyote.http11.filters.IdentityOutputFilter.doWrite(IdentityOutputFilter.java:127)
at 
org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:583)
at org.apache.coyote.Response.doWrite(Response.java:560)
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:365)
... 25 more

2012-06-11 12:23:34 ERROR: Servlet.service() for servlet default threw exception
java.lang.IllegalStateException
at 
org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:405)
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:283)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(Standar

[jira] [Updated] (SOLR-3351) eDismax: ps2 and ps3 params

2012-06-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3351:
--

Fix Version/s: 5.0

Preparing for commit...

> eDismax: ps2 and ps3 params
> ---
>
> Key: SOLR-3351
> URL: https://issues.apache.org/jira/browse/SOLR-3351
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3351.patch
>
>
> Add support for custom Phrase Slop for "pf2" and "pf3" of edismax. If not 
> specified, it should use "ps" as today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3351) eDismax: ps2 and ps3 params

2012-06-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-3351.
---

Resolution: Fixed

Committed. See docs at http://wiki.apache.org/solr/ExtendedDisMax

> eDismax: ps2 and ps3 params
> ---
>
> Key: SOLR-3351
> URL: https://issues.apache.org/jira/browse/SOLR-3351
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3351.patch
>
>
> Add support for custom Phrase Slop for "pf2" and "pf3" of edismax. If not 
> specified, it should use "ps" as today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1856) In Solr Cell, literals should override Tika-parsed values

2012-06-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1856:
--

Fix Version/s: 5.0

> In Solr Cell, literals should override Tika-parsed values
> -
>
> Key: SOLR-1856
> URL: https://issues.apache.org/jira/browse/SOLR-1856
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Chris Harris
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-1856.patch
>
>
> I propose that ExtractingRequestHandler / SolrCell literals should take 
> precedence over Tika-parsed metadata in all situations, including where 
> multiValued="true". (Compare SOLR-1633?)
> My personal motivation is that I have several fields (e.g. "title", "date") 
> where my own metadata is much superior to what Tika offers, and I want to 
> throw those Tika values away. (I actually wouldn't mind throwing away _all_ 
> Tika-parsed values, but let's set that aside.) SOLR-1634 is one potential 
> approach to this, but the fix here might be simpler.
> I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1856) In Solr Cell, literals should override Tika-parsed values

2012-06-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1856:
--

Attachment: SOLR-1856.patch

Updated patch for trunk, with /trunk as base, not /solr.

I added the request param literalsOverride=true|false which defaults to true, 
and documented it at http://wiki.apache.org/solr/ExtractingRequestHandler

Think this is ready for commit, will then backport to 4.x

> In Solr Cell, literals should override Tika-parsed values
> -
>
> Key: SOLR-1856
> URL: https://issues.apache.org/jira/browse/SOLR-1856
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Chris Harris
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-1856.patch, SOLR-1856.patch
>
>
> I propose that ExtractingRequestHandler / SolrCell literals should take 
> precedence over Tika-parsed metadata in all situations, including where 
> multiValued="true". (Compare SOLR-1633?)
> My personal motivation is that I have several fields (e.g. "title", "date") 
> where my own metadata is much superior to what Tika offers, and I want to 
> throw those Tika values away. (I actually wouldn't mind throwing away _all_ 
> Tika-parsed values, but let's set that aside.) SOLR-1634 is one potential 
> approach to this, but the fix here might be simpler.
> I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1929) Index encrypted files

2012-06-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1929:
--

  Description: SolrCell should be able to index encrypted files (pdfs, word 
docs).  (was: SolrCell is not able to index encrypted pdfs.
This is easily done by supplying the password in the metadata passed on to tika)
Fix Version/s: 5.0
  Summary: Index encrypted files  (was: Index encrypted pdf files)

For PDFs there was a possibility of supplying the password in the metadata 
passed on to tika (as the first patch here). However, since TIKA-850, we can 
now supply a PasswordProvider on the context, which will provide the password 
and is future proof for any document type which supports it.

> Index encrypted files
> -
>
> Key: SOLR-1929
> URL: https://issues.apache.org/jira/browse/SOLR-1929
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Yiannis Pericleous
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-1929.patch
>
>
> SolrCell should be able to index encrypted files (pdfs, word docs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3134) Include shard Information in response

2012-06-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Galić updated SOLR-3134:
-

Attachment: SOLR-3134-shard-info-3_6-backport.patch

Fix 3_x-backport to work with 3_6

> Include shard Information in response
> -
>
> Key: SOLR-3134
> URL: https://issues.apache.org/jira/browse/SOLR-3134
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 4.0
>
> Attachments: SOLR-3134-shard-info-3_5-backport.patch, 
> SOLR-3134-shard-info-3_6-backport.patch, 
> SOLR-3134-shard-info-3_x-backport.patch, SOLR-3134-shard-info-fix.patch, 
> SOLR-3134-shard-info.patch
>
>
> For distributed search where each shard represents a logically different 
> index (or physical location), it would be great to know the hit count for 
> each shard.
> In addition, it would be nice to get error info for each shard rather then 
> aborting the whole request when something fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3134) Include shard Information in response

2012-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400491#comment-13400491
 ] 

Igor Galić edited comment on SOLR-3134 at 6/25/12 2:51 PM:
---

Fix 3_x-backport to work with 3_6

oh, well.. Doesn't actually compile :-/

  was (Author: i.galic):
Fix 3_x-backport to work with 3_6
  
> Include shard Information in response
> -
>
> Key: SOLR-3134
> URL: https://issues.apache.org/jira/browse/SOLR-3134
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 4.0
>
> Attachments: SOLR-3134-shard-info-3_5-backport.patch, 
> SOLR-3134-shard-info-3_6-backport.patch, 
> SOLR-3134-shard-info-3_x-backport.patch, SOLR-3134-shard-info-fix.patch, 
> SOLR-3134-shard-info.patch
>
>
> For distributed search where each shard represents a logically different 
> index (or physical location), it would be great to know the hit count for 
> each shard.
> In addition, it would be nice to get error info for each shard rather then 
> aborting the whole request when something fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1770) move default example core config/data into a collection1 folder

2012-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400975#comment-13400975
 ] 

Jan Høydahl commented on SOLR-1770:
---

+1, but perhaps choose a better name than "collection1" for the "products" 
example core?

> move default example core config/data into a collection1 folder
> ---
>
> Key: SOLR-1770
> URL: https://issues.apache.org/jira/browse/SOLR-1770
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Critical
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-1770.patch
>
>
> This is a better starting point for adding more cores - perhaps we can also 
> get rid of multi-core example

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3575) solr.xml should default to persist=true

2012-06-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400995#comment-13400995
 ] 

Jan Høydahl commented on SOLR-3575:
---

+1 

bq. related: it would be nice if the coming/going of cores didn't necessitate 
changing solr.xml Perhaps every directory under the solr home could implicitly 
define a collection.

Yea, been thinking the same. How about a "my-core/collection.properties" file 
to set the various props? If file exists but is empty it could be a simple 
marker that this is a core and assume defaults for all settings. And how about 
adding SOLR_HOME/lib as shared lib folder by convention and SOLR_HOME/core/lib 
as core-specific lib folder? We might then not need solr.xml at all?

> solr.xml should default to persist=true
> ---
>
> Key: SOLR-3575
>     URL: https://issues.apache.org/jira/browse/SOLR-3575
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0, 5.0
>
>
> The default of false is kind of silly IMO.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters

2012-06-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401430#comment-13401430
 ] 

Jan Høydahl commented on SOLR-3467:
---

Great. Are you able to add a JUnit test which reproduces the bug and shows that 
it is fixed?

> ExtendedDismax escaping is missing several reserved characters
> --
>
> Key: SOLR-3467
> URL: https://issues.apache.org/jira/browse/SOLR-3467
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: Michael Dodsworth
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3467.patch
>
>
> When edismax is unable to parse the original user query, it retries using an 
> escaped version of that query (where all reserved chars have been escaped).
> Currently, the escaping done in {{splitIntoClauses}} appears to be missing 
> several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', 
> '&', '/'}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3477) SOLR does not start up when no cores are defined

2012-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3477:
--

Attachment: SOLR-3477-3_6.patch

Updated 3.6 patch which simply removes the zero cores check/throw - this 
mirrors current trunk/4.0 code.

> SOLR does not start up when no cores are defined
> 
>
> Key: SOLR-3477
> URL: https://issues.apache.org/jira/browse/SOLR-3477
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6
> Environment: All environments
>Reporter: Sebastian Schaffert
>Assignee: Tommaso Teofili
>Priority: Critical
> Fix For: 4.0
>
> Attachments: SOLR-3477-3_6.patch, SOLR-3477-3_6.patch, SOLR-3477.patch
>
>
> Since version 3.6.0, Solr does not start up when no cores are defined in 
> solr.xml. The problematic code is in CoresContainer.java, lines 171-173.
> org.apache.solr.common.SolrException: No cores were created, please check the 
> logs for errors
>   at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:172)
>  ~[solr-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:34:38]
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) 
> ~[solr-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:34:38]
> ...
> In our case, this is however a valid situation, because we create the cores 
> programatically by calling the webservices to register new cores. The server 
> is initially started with no cores defined, and depending on the 
> configuration of our application, cores are then created dynamically.
> For the time being, we have to stick with version 3.5, which did not have 
> this problem (or feature).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1924) Solr's updateRequestHandler does not have a fast way of guaranteeing document delivery

2010-10-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921919#action_12921919
 ] 

Jan Høydahl commented on SOLR-1924:
---

In a multi node environment, it would also be useful to maintain state as to 
whether a batch is replicated to the slaves. This is because in case of 
disaster crash on a master, the feeding client may have got callback that a 
batch is secured, but it was not yet replicated, i.e. the only copy was on the 
now crashed master. The master should be able to keep track of whether at least 
one replica has fetched a certain version of the index through the 
ReplicationHandler. In this way, a client could choose to act on the 
replication status instead of persisted status. The  operation would 
now return an additional state:
fooBar fooBar0001 
fooBar0002 fooBar0003

> Solr's updateRequestHandler does not have a fast way of guaranteeing document 
> delivery
> --
>
> Key: SOLR-1924
> URL: https://issues.apache.org/jira/browse/SOLR-1924
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Karl Wright
>
> It is currently not possible, without performing a commit on every document, 
> to use updateRequestHandler to guarantee delivery into the index of any 
> document.  The reason is that whenever Solr is restarted, some or all 
> documents that have not been committed yet are dropped on the floor, and 
> there is no way for a client of updateRequestHandler to know which ones this 
> happened to.
> I believe it is not even possible to write a middleware-style layer that 
> stores documents and performs periodic commits on its own, because the update 
> request handler never ACKs individual documents on a commit, but merely 
> everything it has seen since the last time Solr bounced.  So you have this 
> potential scenario:
> - middleware layer receives document 1, saves it
> - middleware layer receives document 2, saves it
> Now it's time for the commit, so:
> - middleware layer sends document 1 to updateRequestHandler
> - solr is restarted, dropping all uncommitted documents on the floor
> - middleware layer sends document 2 to updateRequestHandler
> - middleware layer sends COMMIT to updateRequestHandler, but solr adds only 
> document 2 to the index
> - middleware believes incorrectly that it has successfully committed both 
> documents
> An ideal solution would be for Solr to separate the semantics of commit (the 
> index building variety) from the semantics of commit (the 'I got the 
> document' variety).  Perhaps this will involve a persistent document queue 
> that will persist over a Solr restart.
> An alternative mechanism might be for updateRequestHandler to acknowledge 
> specifically committed documents in its response to an explicit commit.  But 
> this would make it difficult or impossible to use autocommit usefully in such 
> situations.  The only other alternative is to require clients that need 
> guaranteed delivery to commit on every document, with a considerable 
> performance penalty.
> This ticket is related to LCF in that LCF is one of the clients that really 
> needs some kind of guaranteed delivery mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2230) solrj: submitting more that one stream/file via CommonsHttpSolrServer fails

2010-11-10 Thread JIRA
solrj: submitting more that one stream/file via CommonsHttpSolrServer fails
---

 Key: SOLR-2230
 URL: https://issues.apache.org/jira/browse/SOLR-2230
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4.1
Reporter: Stephan GÜnther


If you are using an HTTP-client (CommonsHttpSolrServer) to connect to Solr, you 
are unable to push more than one File/Stream over the wire. 
For example, if you call 
ContentStreamUpdateRequest.addContentStream()/.addFile() twice to index both 
files via Tika, you get the following exception at your Solr server:

15:48:59 [ERROR] http-8983-1 [org.apache.solr.core.SolrCore] - 
org.apache.solr.common.SolrException: missing content stream
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:49)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)

Seems that the POST body send by CommonsHttpSolrServer is not correct.
If you push only one file, everything works as expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2230) solrj: submitting more that one stream/file via CommonsHttpSolrServer fails

2010-11-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan GÜnther updated SOLR-2230:
--

Attachment: 0001-solrj-fix-submitting-more-that-one-stream-via-multip.patch

I attached patch to fix the problem - intended for inclusion.
Comments/feedback welcome.

> solrj: submitting more that one stream/file via CommonsHttpSolrServer fails
> ---
>
> Key: SOLR-2230
> URL: https://issues.apache.org/jira/browse/SOLR-2230
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.4.1
>Reporter: Stephan GÜnther
> Attachments: 
> 0001-solrj-fix-submitting-more-that-one-stream-via-multip.patch
>
>
> If you are using an HTTP-client (CommonsHttpSolrServer) to connect to Solr, 
> you are unable to push more than one File/Stream over the wire. 
> For example, if you call 
> ContentStreamUpdateRequest.addContentStream()/.addFile() twice to index both 
> files via Tika, you get the following exception at your Solr server:
> 15:48:59 [ERROR] http-8983-1 [org.apache.solr.core.SolrCore] - 
> org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:49)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>   at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>   at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>   at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>   at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>   at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>   at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>   at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>   at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>   at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>   at java.lang.Thread.run(Thread.java:619)
> Seems that the POST body send by CommonsHttpSolrServer is not correct.
> If you push only one file, everything works as expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2230) solrj: submitting more than one stream/file via CommonsHttpSolrServer fails

2010-11-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan GÜnther updated SOLR-2230:
--

Summary: solrj: submitting more than one stream/file via 
CommonsHttpSolrServer fails  (was: solrj: submitting more that one stream/file 
via CommonsHttpSolrServer fails)

> solrj: submitting more than one stream/file via CommonsHttpSolrServer fails
> ---
>
> Key: SOLR-2230
> URL: https://issues.apache.org/jira/browse/SOLR-2230
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.4.1
>Reporter: Stephan GÜnther
> Attachments: 
> 0001-solrj-fix-submitting-more-that-one-stream-via-multip.patch
>
>
> If you are using an HTTP-client (CommonsHttpSolrServer) to connect to Solr, 
> you are unable to push more than one File/Stream over the wire. 
> For example, if you call 
> ContentStreamUpdateRequest.addContentStream()/.addFile() twice to index both 
> files via Tika, you get the following exception at your Solr server:
> 15:48:59 [ERROR] http-8983-1 [org.apache.solr.core.SolrCore] - 
> org.apache.solr.common.SolrException: missing content stream
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:49)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>   at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>   at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>   at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>   at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>   at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>   at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>   at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>   at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>   at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>   at java.lang.Thread.run(Thread.java:619)
> Seems that the POST body send by CommonsHttpSolrServer is not correct.
> If you push only one file, everything works as expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2045) DIH doesn't release jdbc connections in conjunction with DB2

2010-11-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933479#action_12933479
 ] 

Kjetil Ødegaard commented on SOLR-2045:
---

We see the same issue on Oracle (11g).

> DIH doesn't release jdbc connections in conjunction with DB2 
> -
>
> Key: SOLR-2045
> URL: https://issues.apache.org/jira/browse/SOLR-2045
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4.1
> Environment: DB2 SQLLIB 9.5, 9.7 jdbc Driver
>Reporter: Fenlor Sebastia
>
> Using the JDBCDatasource in conjunction with the DB2 JDBC Drivers results in 
> the following error when the DIH tries to close the the connection due to 
> active transactions. As a consequence each delta im port or full import opens 
> a new connection without closing it. So the maximum amount of connections 
> will be reached soon. Setting the connection to readOnly or changing the 
> transaction isolation level doesn't help neither.
> The JDBC Driver I used: "com.ibm.db2.jcc.DB2Driver" relieing in db2jcc4.jar 
> shipped with DB2 Express 9.7 for example
> Here is the stack trace...
> 14.08.2010 01:49:51 org.apache.solr.handler.dataimport.JdbcDataSource 
> closeConnection
> FATAL: Ignoring Error when closing connection
> com.ibm.db2.jcc.am.SqlException: [jcc][10251][10308][4.8.87] 
> java.sql.Connection.close() requested while a transaction is in progress on 
> the connection.The transaction remains active, and the connection cannot be 
> closed. ERRORCODE=-4471, SQLSTATE=null
>   at com.ibm.db2.jcc.am.gd.a(gd.java:660)
>   at com.ibm.db2.jcc.am.gd.a(gd.java:60)
>   at com.ibm.db2.jcc.am.gd.a(gd.java:120)
>   at com.ibm.db2.jcc.am.lb.u(lb.java:1202)
>   at com.ibm.db2.jcc.am.lb.x(lb.java:1225)
>   at com.ibm.db2.jcc.am.lb.v(lb.java:1211)
>   at com.ibm.db2.jcc.am.lb.close(lb.java:1195)
>   at com.ibm.db2.jcc.uw.UWConnection.close(UWConnection.java:838)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399)
>   at 
> org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390)
>   at 
> org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173)
>   at 
> org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339)
>   at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>   at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
> Well the issue can be solved by invoking a commit or rollback directly before 
> the connection.close() statement. Here is the code snipped of changes I made 
> in JdbcDatasource.java
>   private void closeConnection()  {
> try {
>   if (conn != null) {
>   if (conn.isReadOnly())
>   {
>   LOG.info("connection is readonly, therefore rollback");
>   conn.rollback();
>   } else
>   {
>   LOG.info("connection is not readonly, therefore 
> commit");
>   conn.commit();
>   }
> 
> conn.close();
>   }
> } catch (Exception e) {
>   LOG.error("Ignoring Error when closing connection", e);
> }
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Attachment: SOLR-1979.patch

First raw patch implementing language identification.

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Priority: Minor
> Attachments: SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we should wrap the [Nutch 
> LanguageIdentifier|http://nutch.apache.org/apidocs-1.1/org/apache/nutch/analysis/lang/LanguageIdentifier.html";]
>  in an UpdateProcessor. The processor should be configured like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> title,teaser,body
> language
> language_display
> 
> {code} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Description: 
We need the ability to detect language of some random text in order to act upon 
it, such as indexing the content into language aware fields. Another usecase is 
to be able to filter/facet on language on random unstructured content.

To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
processor is configurable like this:

{code:xml} 
  
name,subject
language_s
id
en
  
{code} 

It will then read the text from inputFields name and subject, perform language 
identification and output the ISO code for the detected language in the 
outputField. If no language was detected, fallback language is used.

  was:
We need the ability to detect language of some random text in order to act upon 
it, such as indexing the content into language aware fields. Another usecase is 
to be able to filter/facet on language on random unstructured content.

To do this, we should wrap the [Nutch 
LanguageIdentifier|http://nutch.apache.org/apidocs-1.1/org/apache/nutch/analysis/lang/LanguageIdentifier.html";]
 in an UpdateProcessor. The processor should be configured like this:

{code:xml} 
  
title,teaser,body
language
language_display

{code} 


> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Priority: Minor
> Attachments: SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2244) Add Language Identification support

2010-12-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966734#action_12966734
 ] 

Jan Høydahl commented on SOLR-2244:
---

Added my patch to SOLR-1979. The difference from this patch is that it is based 
on contrib/extraction, is configured in-line instead of through own config 
file, and has a fallback configuration.

> Add Language Identification support
> ---
>
> Key: SOLR-2244
> URL: https://issues.apache.org/jira/browse/SOLR-2244
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
> Attachments: solr2244.patch
>
>
> For starters, Tika has language identification capabilities that we can 
> likely leverage, but moreover, make it easier for people to plug in language 
> identification into the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2799) MMapDirectory not designed for inheritance

2010-12-04 Thread JIRA
MMapDirectory not designed for inheritance
--

 Key: LUCENE-2799
 URL: https://issues.apache.org/jira/browse/LUCENE-2799
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.0.3
Reporter: René Treffer


How to reproduce

Try to inherit from MMapDirectory to change the openInput logic (open files 
from different directories).

Expected result:

Inherit from MMapDirectory, overwrite the one method, done.

Actual result:

It's impossible to overwrite the method as the inner classes would be missing. 
It's impossible to fork the inner classes as they depend on a final method with 
default visibility (cleanMapping).
It turns out to be the easiest option to completely for the code and replace 
just the method in question.

Possible fix:

Change the visibility of most members and subtypes to be at least protected and 
avoid the default visibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2799) MMapDirectory not designed for inheritance

2010-12-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966866#action_12966866
 ] 

René Treffer commented on LUCENE-2799:
--

Ah, right, but FileSwitchDirectory won't solve my problems either as every file 
is in a different directory. A set of MMapDirectories + delegation might help 
here.

But it would still be nice if MMapDirectory would either be final or extension 
safe. Or if the implicit no-exension policy of openInput would be made explicit 
with a final tag.

> MMapDirectory not designed for inheritance
> --
>
> Key: LUCENE-2799
> URL: https://issues.apache.org/jira/browse/LUCENE-2799
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Affects Versions: 3.0.3
>Reporter: René Treffer
>
> How to reproduce
> Try to inherit from MMapDirectory to change the openInput logic (open files 
> from different directories).
> Expected result:
> Inherit from MMapDirectory, overwrite the one method, done.
> Actual result:
> It's impossible to overwrite the method as the inner classes would be 
> missing. It's impossible to fork the inner classes as they depend on a final 
> method with default visibility (cleanMapping).
> It turns out to be the easiest option to completely for the code and replace 
> just the method in question.
> Possible fix:
> Change the visibility of most members and subtypes to be at least protected 
> and avoid the default visibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966964#action_12966964
 ] 

Jan Høydahl commented on SOLR-1979:
---

Simply allowing to set the threshold for isReasonablyCertain() is probably not 
enough to get a robust detection. This is because the distance measure is very 
sensitive to the length of the profiles in use. Thus, it is a bit dangerous to 
expose getDistance() as in TIKA-568, cause that distance measure is kind of an 
internal value, not very normalized and is bound to change in future versions 
of TIKA.

See TIKA-369 and TIKA-496.

I think the right way to go is solving these two issues first. By fixing so 
that getDisance() is not biased towards profile length, we can make a new 
isReasonablyCertain() implementation taking into account the relative distance 
between first and second candidate languages...

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966970#action_12966970
 ] 

Jan Høydahl commented on SOLR-1979:
---

The idField input parameter is just used for decent logging if detection fails. 
It would be more elegant to get the id field name automatically through 
SolrCore...

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967032#action_12967032
 ] 

Jan Høydahl commented on SOLR-1979:
---

@Robert: Yes, there must be a way to tell whether or not the language even has 
a profile, through some well defined method. It's not important HOW we improve 
detection certainty, but comparing the top n distances could help. I'm also a 
fan of including other metrics than profile similarity if that can help, 
however for unique scripts that will automatically be covered by profile 
similarity. Detailed solution discussions should continue in TIKA-369.

Macro languages: See TIKA-493

It makes sense to allow for detecting languages outside 639-1, and I believe 
RFC3066 and BCP47 are both re-using the 639 codes, so that if there is a 
2-letter code for a language it will be used. 639-1 is what "everyone" already 
knows.

In general, improvements should be done in Tika space, then use those in Solr, 
thus building one strong language detection library.

@Grant: I actually planned to do the regEx based field name mapping in a 
separate UpdateProcessor, to make things more flexible. Example:
{code:xml} 
  
language
(.*?)_lang
$1_$lang
$1_t
de,en,fr,it,es,nl
  
{code} 

Your thought of allowing to detect language for individual fields in one go is 
also interesting. I'd love to see metadata support in SolrInputDocument, so 
that one processor could annotate a @language on the fields analyzed. Then next 
processor could act on metadata to rename field...

@Yonik: By allowing regex naming of field names, we give users a generic tool 
to avoid field name clashes, by picking the pattern.. Mapping multiple 
languages to same suffix also makes sense.


> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch, SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967211#action_12967211
 ] 

Jan Høydahl commented on SOLR-1979:
---

@Grant: "I dropped the outputField setting and a number of other settings"

There should be a way to output the language for the whole document to some 
field as some applications need to filter on language.

I like making most things configurable, but with good defaults which fits most 
needs. The default could be to detect a document wide langauge from all input 
fields and output this to a "language_s" field, unless you specify params 
docLangInputFields=f1,f2.. and docLangOutputField=nn. Likewise make it easy to 
disable field renaming.

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968627#action_12968627
 ] 

Jan Høydahl commented on SOLR-1979:
---

Allow for both a "language" field and a "languages" (multivalued) field.
If fields are mapped, the new name reflect the language, so I don't know if we 
need a field->lang mapping.
However, have you considered extending the document model to allow metadata per 
field? Then @language would be a valid field metadata, mostly as a means for 
later processing to pick up and act on. This can be a valuable mechanism for 
other inter processor communication as well as to pass info between document 
centric processing and Analysis.

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968806#action_12968806
 ] 

Jan Høydahl commented on SOLR-1979:
---

Discussion on the process for adding language profiles to TIKA should be 
continued in TIKA-546

I have a plan to add profiles for the Norwegian and Sami languages when time 
allows: TIKA-491 TIKA-492

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968820#action_12968820
 ] 

Jan Høydahl commented on SOLR-1979:
---

>>I have a plan to add profiles for the Norwegian and Sami languages when time 
>>allows: TIKA-491 TIKA-492
>Did you plan to also upgrade tika from 639-1 for the Sami languages? the only 
>639-1 code i see is "se" but this seems to be appropriate only for North Sami.

Exactly. That's one example which will need a wider range of codes. I was 
planning to use 639-2 for those that do not have a 2-letter code, but BCP47 it 
will be now (although the end result may be more or less the same)

We also need to detect whether a language is part of a macro language, and add 
both to languages multivalue field, because it should be possible to filter on 
Norwegian (no) without specifying both nn and nb, and also for sami (smi) 
without specifying all of the specific languages.

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2281) Error: Invalid value 'explicit' for echoParams parameter, use 'EXPLICIT' or 'ALL'

2010-12-11 Thread JIRA
Error: Invalid value 'explicit' for echoParams parameter, use 'EXPLICIT' or 
'ALL'
-

 Key: SOLR-2281
 URL: https://issues.apache.org/jira/browse/SOLR-2281
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1
Reporter: Başar Aykut


Invalid value 'explicit' for echoParams paramet2er, use 'EXPLICIT' or 'ALL' 
error is displayed when the default config file is used. In the config file 
echoParams value is 'explicit' , however for the Turkish locale uppercase of 
the word 'explicit' is EXPLİCİT and this doesn't match the word 'EXPLICIT'. 

toUpperCase(Locale.ENGLISH) can be used instead of using it with the default 
locale:

{code}
  public enum EchoParamStyle {
EXPLICIT,
ALL,
NONE;

public static EchoParamStyle get( String v ) {
  if( v != null ) {
v = v.toUpperCase();
if( v.equals( "EXPLICIT" ) ) {
  return EXPLICIT;
}
if( v.equals( "ALL") ) {
  return ALL;
}
if( v.equals( "NONE") ) {  // the same as nothing...
  return NONE;
}
  }
  return null;
}
  };
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971338#action_12971338
 ] 

Jan Høydahl commented on SOLR-1979:
---

{quote}
Jan, do you have any updates to the patch? I'd like to move forward with the 
basic functionality at least, but I still think we need the field mapping 
stuff, or we should punt all field mapping stuff to another processor. WDYT?
{quote}

I don't have any updates.

Keep it basic in first version. Allow for per-document and per-field detection.

Make field-mapping configurable and optional (default off), allowing people to 
chain in their own mapper downstream if they choose.

Mixed-language per field is a different beast and should be dealt with to 
later. Probably requires analysis changes as well if we want analyzers to pick 
up language from payloads or something.

My 2 cents

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch
>
>
> We need the ability to detect language of some random text in order to act 
> upon it, such as indexing the content into language aware fields. Another 
> usecase is to be able to filter/facet on language on random unstructured 
> content.
> To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
> processor is configurable like this:
> {code:xml} 
>class="org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory">
> name,subject
> language_s
> id
> en
>   
> {code} 
> It will then read the text from inputFields name and subject, perform 
> language identification and output the ISO code for the detected language in 
> the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1526) Client Side Tika integration

2010-12-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973058#action_12973058
 ] 

Tomás Fernández Löbbe commented on SOLR-1526:
-

I have a possible implementation for this jira. I created a class 
SolrFileInputDocument that extends SolrInputDocument, the main difference is 
that it contains the methods:

public void addFile(InputStream file)

and 

public void addFile(InputStream file , Metadata metadata)


This two methods will use Tika to extract the content and will end up creating 
fields (this.addField(...)) of the parent class SolrInputDocument. The 
SolrFileInputDocument accepts a Map instance to map the extracted metadata to a 
Solr field, something like this:

Map map = new HashMap();
map.put("content", "text");
map.put("keywords", "cat");
map.put("creator", "manu");
SolrFileInputDocument document = new  
SolrFileInputDocument(map);

I added the classes to another "contrib" directory, I don't know if this should 
be done this way, I just didn't want to add a dependency with Tika that might 
be not always needed.  Adding this code to a client application would require 
to add the SolrJ jar plus the "clientextraction" jar

I still haven't done anything to keep  the "prefix" feature of the 
ExtractingRequestHandler (which I don't think is going to be difficult) and I'm 
still don't manage non text fields like dates, but I could do it if you think 
this is a good approach.

Do you think this could work? I can upload the code tomorrow.

> Client Side Tika integration
> 
>
> Key: SOLR-1526
> URL: https://issues.apache.org/jira/browse/SOLR-1526
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: Next
>
>
> Often times it is cost prohibitive to send full, rich documents over the 
> wire.  The contrib/extraction library has server side integration with Tika, 
> but it would be nice to have a client side implementation as well.  It should 
> support both metadata and content or just metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1526) Client Side Tika integration

2010-12-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974176#action_12974176
 ] 

Jan Høydahl commented on SOLR-1526:
---

I linked this issue to SOLR-1763, as they attempt to solve the same thing, on 
client vs server side.

Instead of creating two solutions, we should base these two on same code base 
and config, so that it is easy to switch between them. Perhaps someone starts 
with server-side extraction but then want to optimize performance by going 
client-side. The switch should be intuitive.

Thus, should we consider porting the whole UpdateProcessorChain to SolrJ? How 
cool would it be to choose whether to execute an UP on client or server side 
simply by configuration change? I realize that some UP's may depend on SolrCore 
or have other difficult dependencies, but it should be possible to work around, 
not?

> Client Side Tika integration
> 
>
> Key: SOLR-1526
> URL: https://issues.apache.org/jira/browse/SOLR-1526
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: Next
>
>
> Often times it is cost prohibitive to send full, rich documents over the 
> wire.  The contrib/extraction library has server side integration with Tika, 
> but it would be nice to have a client side implementation as well.  It should 
> support both metadata and content or just metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2293) SolrCloud distributed indexing

2010-12-22 Thread JIRA
SolrCloud distributed indexing
--

 Key: SOLR-2293
 URL: https://issues.apache.org/jira/browse/SOLR-2293
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Jan Høydahl


Add SolrCloud support for distributed indexing, as described in 
http://wiki.apache.org/solr/DistributedSearch#Distributed_Indexing and the 
"Support user specified partitioning" paragraph of 
http://wiki.apache.org/solr/SolrCloud#High_level_design_goals

Currently, the client needs to decide what shard indexer to talk to for each 
document. Common partitioning strategies include has-based, date-based and 
"custom".

Solr should have the capability of accepting a document update on any of the 
nodes in a cluster, and perform partitioning and distribution of updates to 
correct shard, based on current ZK config. The ShardDistributionPolicy should 
be pluggable, with the most common provided out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1526) Client Side Tika integration

2011-01-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978474#action_12978474
 ] 

Tomás Fernández Löbbe commented on SOLR-1526:
-

I'm sorry, I saw some comments about the UpdateProcessors, but I couldn't fin 
enough documentation. Is this a new component? Is it documented somewhere?
I saw you've been working  with SOLR-1763, do you have something of that?


> Client Side Tika integration
> 
>
> Key: SOLR-1526
>     URL: https://issues.apache.org/jira/browse/SOLR-1526
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: Next
>
>
> Often times it is cost prohibitive to send full, rich documents over the 
> wire.  The contrib/extraction library has server side integration with Tika, 
> but it would be nice to have a client side implementation as well.  It should 
> support both metadata and content or just metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1526) Client Side Tika integration

2011-01-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978588#action_12978588
 ] 

Jan Høydahl commented on SOLR-1526:
---

Nope, I have not started on 1763 yet. 

> Client Side Tika integration
> 
>
> Key: SOLR-1526
> URL: https://issues.apache.org/jira/browse/SOLR-1526
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: Next
>
>
> Often times it is cost prohibitive to send full, rich documents over the 
> wire.  The contrib/extraction library has server side integration with Tika, 
> but it would be nice to have a client side implementation as well.  It should 
> support both metadata and content or just metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1526) Client Side Tika integration

2011-01-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978825#action_12978825
 ] 

Tomás Fernández Löbbe commented on SOLR-1526:
-

Now I get what you say about the UpdateRequestProcessor (I thought you where 
talking about a different/new component). I like the idea of reuse the code, I 
don't like the idea of adding complexity to SolrJ. Is it worthy to port the 
UpadateRequestProcessorChain to SolrJ? I definitely wouldn't like to have a 
configuration file on the SolrJ API.

> Client Side Tika integration
> 
>
> Key: SOLR-1526
> URL: https://issues.apache.org/jira/browse/SOLR-1526
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: Next
>
>
> Often times it is cost prohibitive to send full, rich documents over the 
> wire.  The contrib/extraction library has server side integration with Tika, 
> but it would be nice to have a client side implementation as well.  It should 
> support both metadata and content or just metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-236) Field collapsing

2011-01-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979193#action_12979193
 ] 

Samuel García Martínez commented on SOLR-236:
-

The NPE noticed by Shekhar Nirkhe is caused by some errors on filter query 
cache and the signature key that is using to store cached results. 

To sum up, if you perform a filter query and then, you perform that query using 
collapse field, that query result is already cached, but not cached as expected 
by this component. Resulting that the DocSet implementation is not the expected 
one, and, as cached result, the DocumentCollector is not executed at any time.

As soon as i can ill post a patch using combined key to cache results, formed 
by the collector class and the query itself.

Colbenson - Findability Experts 
http://www.colbenson.es/



> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: Next
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
> field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
> quasidistributed.additional.patch, 
> SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, 
> SOLR-236-distinctFacet.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2500) A Linux-specific Directory impl that bypasses the buffer cache

2011-01-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979588#action_12979588
 ] 

Christian Kohlschütter commented on LUCENE-2500:


I guess it would not be difficult to add Mac OS X support (via F_NOCACHE)?

see http://evanjones.ca/write-latency-alignment.html


> A Linux-specific Directory impl that bypasses the buffer cache
> --
>
> Key: LUCENE-2500
> URL: https://issues.apache.org/jira/browse/LUCENE-2500
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2500.patch
>
>
> I've been testing how we could prevent Lucene's merges from evicting
> pages from the OS's buffer cache.  I tried fadvise/madvise (via JNI)
> but (frustratingly), I could not get them to work (details at
> http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html).
> The only thing that worked was to use Linux's O_DIRECT flag, which
> forces all IO to bypass the buffer cache entirely... so I created a
> Linux-specific Directory impl to do this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2

2012-08-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3707:
--

Fix Version/s: 5.0

> Upgrade Solr to Tika 1.2
> 
>
> Key: SOLR-3707
> URL: https://issues.apache.org/jira/browse/SOLR-3707
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 5.0, 4.0
>
> Attachments: SOLR-3707.patch, SOLR-3707.patch
>
>
> Tika 1.2 has been released with these improvements: 
> http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3691) SimplePostTool: Mode for indexing a web page

2012-08-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3691:
--

Attachment: SOLR-3691.patch

New patch. This is totally reorganizing the code to make it testable and adds a 
bunch of unit tests.

Also added basic robots.txt support, so that we don't offend anyone.

[~lancenorskog], can you take it for a test ride?

> SimplePostTool: Mode for indexing a web page
> 
>
> Key: SOLR-3691
> URL: https://issues.apache.org/jira/browse/SOLR-3691
> Project: Solr
>  Issue Type: Bug
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0
>
> Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch, 
> SOLR-3691.patch
>
>
> The simple post.jar tool should both show some sample code as well as aid 
> users in testing Solr from the command line. Missing is an easy way to index 
> a web page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3691) SimplePostTool: Mode for indexing a web page

2012-08-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435851#comment-13435851
 ] 

Jan Høydahl commented on SOLR-3691:
---

Here's the new help screen including "web" mode, "depth" and "delay" support:
{noformat}
SimplePostTool version 1.5
Usage: java [SystemProperties] -jar post.jar [-h|-] [ 
[...]]

Supported System Properties and their defaults:
  -Ddata=files|web|args|stdin (default=files)
  -Dtype= (default=application/xml)
  -Durl= (default=http://localhost:8983/solr/update)
  -Dauto=yes|no (default=no)
  -Drecursive=yes|no| (default=0)
  -Ddelay= (default=0 for files, 10 for web)
  -Dfiletypes=[,,...] 
(default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
  -Dparams="=[&=...]" (values must be URL-encoded)
  -Dcommit=yes|no (default=yes)
  -Doptimize=yes|no (default=no)
  -Dout=yes|no (default=no)

This is a simple command line tool for POSTing raw data to a Solr
port.  Data can be read from files specified as commandline args,
URLs specified as args, as raw commandline arg strings or via STDIN.
Examples:
  java -jar post.jar *.xml
  java -Ddata=args  -jar post.jar '42'
  java -Ddata=stdin -jar post.jar < hd.xml
  java -Ddata=web -jar post.jar http://example.com/
  java -Dtype=text/csv -jar post.jar *.csv
  java -Dtype=application/json -jar post.jar *.json
  java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a 
-Dtype=application/pdf -jar post.jar a.pdf
  java -Dauto -jar post.jar *
  java -Dauto -Drecursive -jar post.jar afolder
  java -Dauto -Dfiletypes=ppt,html -jar post.jar afolder
The options controlled by System Properties include the Solr
URL to POST to, the Content-Type of the data, whether a commit
or optimize should be executed, and whether the response should
be written to STDOUT. If auto=yes the tool will try to set type
and url automatically from file name. When posting rich documents
the file name will be propagated as "resource.name" and also used
as "literal.id". You may override these or any other request parameter
through the -Dparams property. To do a commit only, use "-" as argument.
The web mode is a simple crawler following links within domain, default 
delay=10s.
{noformat}

> SimplePostTool: Mode for indexing a web page
> ----
>
> Key: SOLR-3691
> URL: https://issues.apache.org/jira/browse/SOLR-3691
> Project: Solr
>  Issue Type: Bug
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0
>
> Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch, 
> SOLR-3691.patch
>
>
> The simple post.jar tool should both show some sample code as well as aid 
> users in testing Solr from the command line. Missing is an easy way to index 
> a web page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3691) SimplePostTool: Mode for indexing a web page

2012-08-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436310#comment-13436310
 ] 

Jan Høydahl commented on SOLR-3691:
---

bq. Maybe this deserves a rename of *Simple*PostTool to just PostTool now that 
it's not so simple any more?  

Sure I know :) it's more code, but I hope it's actually more simple to follow 
the logic in the code now than before, since it's better structured. Besides, 
we only use standard SDK functions, so it is still self-contained without extra 
deps, which is a major part of the *Simple* name. Besides, since much stuff is 
moved out from main() and into the class, it is also easier for folks to 
utilize this stuff from their own code should they wish.

> SimplePostTool: Mode for indexing a web page
> 
>
> Key: SOLR-3691
> URL: https://issues.apache.org/jira/browse/SOLR-3691
> Project: Solr
>  Issue Type: Bug
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0
>
> Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch, 
> SOLR-3691.patch
>
>
> The simple post.jar tool should both show some sample code as well as aid 
> users in testing Solr from the command line. Missing is an easy way to index 
> a web page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3735) Relocate the example mime-to-extension mapping

2012-08-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436364#comment-13436364
 ] 

Jan Høydahl commented on SOLR-3735:
---

Thanks, I could not find an easy way to initialize that map inside of Velocity 
- this is ideed a better way.
+1

> Relocate the example mime-to-extension mapping
> --
>
> Key: SOLR-3735
> URL: https://issues.apache.org/jira/browse/SOLR-3735
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0-BETA
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3735.patch
>
>
> A mime-to-extension mapping was added to VelocityResponseWriter recently.  
> This really belongs in the templates themselves, not in VrW, as it is 
> specific to the example search results not meant for all VrW templates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3721) Multiple concurrent recoveries of same shard?

2012-08-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436383#comment-13436383
 ] 

Jan Høydahl commented on SOLR-3721:
---

ElasticSearch has some settings to control when recovery starts after cluster 
restart, see 
[Guide|http://www.elasticsearch.org/guide/reference/modules/gateway/]. This 
approach looks reasonable. If we know that we expect N nodes in our cluster we 
can start recovery when we see N nodes up. If fewer than N nodes up, we wait 
for X time (running on local data, not accepting new updates) before recovery 
and leader election starts.

> Multiple concurrent recoveries of same shard?
> -
>
> Key: SOLR-3721
> URL: https://issues.apache.org/jira/browse/SOLR-3721
> Project: Solr
>  Issue Type: Bug
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Using our own Solr release based on Apache revision 
> 1355667 from 4.x branch. Our changes to the Solr version is our solutions to 
> TLT-3178 etc., and should have no effect on this issue.
>Reporter: Per Steffensen
>  Labels: concurrency, multicore, recovery, solrcloud
> Fix For: 4.0
>
> Attachments: recovery_in_progress.png, recovery_start_finish.log
>
>
> We run a performance/endurance test on a 7 Solr instance SolrCloud setup and 
> eventually Solrs lose ZK connections and go into recovery. BTW the recovery 
> often does not ever succeed, but we are looking into that. While doing that I 
> noticed that, according to logs, multiple recoveries are in progress at the 
> same time for the same shard. That cannot be intended and I can certainly 
> imagine that it will cause some problems.
> It is just the logs that are wrong, did I make some mistake, or is this a 
> real bug?
> See attached grep from log, grepping only on "Finished recovery" and 
> "Starting recovery" logs.
> {code}
> grep -B 1 "Finished recovery\|Starting recovery" solr9.log solr8.log 
> solr7.log solr6.log solr5.log solr4.log solr3.log solr2.log solr1.log 
> solr0.log > recovery_start_finish.log
> {code}
> It can be hard to get an overview of the log, but I have generated a graph 
> showing (based alone on "Started recovery" and "Finished recovery" logs) how 
> many recoveries are in progress at any time for the different shards. See 
> attached recovery_in_progress.png. The graph is also a little hard to get an 
> overview of (due to the many shards) but it is clear that for several shards 
> there are multiple recoveries going on at the same time, and that several 
> recoveries never succeed.
> Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3691) SimplePostTool: Mode for indexing a web page

2012-08-17 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3691:
--

Attachment: SOLR-3691.patch

Last update:
* Fixed typo in usage
* Fixed ArrayIndexOutOfBounds when robots.txt contains only a # on one line
* No longer prints redirect warnings for every page on a site, just the first
* No longer throws exception when robots.txt does not exist for a domain :)

I'll commit this to trunk and we can iterate from there.

> SimplePostTool: Mode for indexing a web page
> 
>
> Key: SOLR-3691
> URL: https://issues.apache.org/jira/browse/SOLR-3691
> Project: Solr
>  Issue Type: Bug
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0
>
> Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch, 
> SOLR-3691.patch, SOLR-3691.patch
>
>
> The simple post.jar tool should both show some sample code as well as aid 
> users in testing Solr from the command line. Missing is an easy way to index 
> a web page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3691) SimplePostTool: Mode for indexing a web page

2012-08-17 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3691:
--

Fix Version/s: 5.0

Committed to trunk in r1374497

Will backport to 4.x soon

> SimplePostTool: Mode for indexing a web page
> 
>
> Key: SOLR-3691
> URL: https://issues.apache.org/jira/browse/SOLR-3691
> Project: Solr
>  Issue Type: Bug
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch, 
> SOLR-3691.patch, SOLR-3691.patch
>
>
> The simple post.jar tool should both show some sample code as well as aid 
> users in testing Solr from the command line. Missing is an easy way to index 
> a web page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3707) Upgrade Solr to Tika 1.2

2012-08-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437211#comment-13437211
 ] 

Jan Høydahl commented on SOLR-3707:
---

Committed to trunk as r1374501

Will backport ot 4.x soon

> Upgrade Solr to Tika 1.2
> 
>
> Key: SOLR-3707
> URL: https://issues.apache.org/jira/browse/SOLR-3707
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3707.patch, SOLR-3707.patch
>
>
> Tika 1.2 has been released with these improvements: 
> http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3691) SimplePostTool: Mode for indexing a web page

2012-08-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437286#comment-13437286
 ] 

Jan Høydahl commented on SOLR-3691:
---

Fixed javadocs-lint errors in r1374549

> SimplePostTool: Mode for indexing a web page
> 
>
> Key: SOLR-3691
> URL: https://issues.apache.org/jira/browse/SOLR-3691
> Project: Solr
>  Issue Type: Bug
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch, 
> SOLR-3691.patch, SOLR-3691.patch
>
>
> The simple post.jar tool should both show some sample code as well as aid 
> users in testing Solr from the command line. Missing is an easy way to index 
> a web page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS

2012-08-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437317#comment-13437317
 ] 

Jan Høydahl commented on SOLR-3613:
---

I tried to find an API that gives you the webapp name, but could only find it 
based on an actual Request, not statically. Anyone know of a way (short of 
explicit config option in solr.xml)?

> Namespace Solr's JAVA OPTIONS
> -
>
> Key: SOLR-3613
> URL: https://issues.apache.org/jira/browse/SOLR-3613
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.0-ALPHA
>Reporter: Jan Høydahl
> Fix For: 4.0
>
>
> Solr being a web-app, should play nicely in a setting where users deploy it 
> on a shared appServer.
> To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid 
> name clashes and for clarity when reading your appserver startup script. We 
> currently do that with most: {{solr.solr.home, solr.data.dir, 
> solr.abortOnConfigurationError, solr.directoryFactory, 
> solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we 
> fail to do so.
> Before release of 4.0 we should make sure to clean this up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3707) Upgrade Solr to Tika 1.2

2012-09-02 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-3707.
---

Resolution: Fixed

Committed to 4x in r1380079

> Upgrade Solr to Tika 1.2
> 
>
> Key: SOLR-3707
> URL: https://issues.apache.org/jira/browse/SOLR-3707
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3707.patch, SOLR-3707.patch
>
>
> Tika 1.2 has been released with these improvements: 
> http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3810) I have different numFound between json and xml parser

2012-09-07 Thread JIRA
Yago Riveiro Rodríguez created SOLR-3810:


 Summary: I have different numFound between json and xml parser
 Key: SOLR-3810
 URL: https://issues.apache.org/jira/browse/SOLR-3810
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 4.0-BETA
 Environment: Centos 6.0, kernel 2.6.18-238.9.1.el5 x86_64, tomcat 
7.0.27, solr 4.0-BETA
Reporter: Yago Riveiro Rodríguez
Priority: Critical


This is weird, but I have different numFound between json and xml parser. 

The query is a distributed search, same query only change the wt param.

http://192.168.10.27:8080/solr4.0/4A_Stats201006/select?shards=192.168.10.27:8080/solr4.0/4A_Stats201006,192.168.10.27:8080/solr4.0/4A_Stats201007,192.168.10.27:8080/solr4.0/4A_Stats201008&indent=true&q=*:*



0
199



page697463104
697463104
197861290
65987046

http://192.168.10.27:8080/solr4.0/4A_Stats201006/select?shards=192.168.10.27:8080/solr4.0/4A_Stats201006,192.168.10.27:8080/solr4.0/4A_Stats201007,192.168.10.27:8080/solr4.0/4A_Stats201008&indent=true&q=*:*&wt=json

{
  "responseHeader":{
"status":0,
"QTime":169},
  "response":{"numFound":29519009,"start":0,"maxScore":1.0,"docs":[
  {
"id":"page697463104",
"surrogate_id":697463104,
"session_id":"197861290",
"visitor_id":"65987046",


The dumps are truncate because the sensitive data.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3810) I have different numFound between json and xml parser

2012-09-07 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yago Riveiro Rodríguez updated SOLR-3810:
-

Description: 
This is weird, but I have different numFound between json and xml parser. 

The query is a distributed search, same query only change the wt param.

http://localhost:8080/solr4.0/4A_Stats201006/select?shards=localhost:8080/solr4.0/4A_Stats201006,localhost:8080/solr4.0/4A_Stats201007,localhost:8080/solr4.0/4A_Stats201008&indent=true&q=*:*



0
199



page697463104
697463104
197861290
65987046

http://localhost:8080/solr4.0/4A_Stats201006/select?shards=localhost:8080/solr4.0/4A_Stats201006,localhost:8080/solr4.0/4A_Stats201007,localhost:8080/solr4.0/4A_Stats201008&indent=true&q=*:*&wt=json

{
  "responseHeader":{
"status":0,
"QTime":169},
  "response":{"numFound":29519009,"start":0,"maxScore":1.0,"docs":[
  {
"id":"page697463104",
"surrogate_id":697463104,
"session_id":"197861290",
"visitor_id":"65987046",


The dumps are truncate because the sensitive data.




  was:
This is weird, but I have different numFound between json and xml parser. 

The query is a distributed search, same query only change the wt param.

http://192.168.10.27:8080/solr4.0/4A_Stats201006/select?shards=192.168.10.27:8080/solr4.0/4A_Stats201006,192.168.10.27:8080/solr4.0/4A_Stats201007,192.168.10.27:8080/solr4.0/4A_Stats201008&indent=true&q=*:*



0
199



page697463104
697463104
197861290
65987046

http://192.168.10.27:8080/solr4.0/4A_Stats201006/select?shards=192.168.10.27:8080/solr4.0/4A_Stats201006,192.168.10.27:8080/solr4.0/4A_Stats201007,192.168.10.27:8080/solr4.0/4A_Stats201008&indent=true&q=*:*&wt=json

{
  "responseHeader":{
"status":0,
"QTime":169},
  "response":{"numFound":29519009,"start":0,"maxScore":1.0,"docs":[
  {
"id":"page697463104",
"surrogate_id":697463104,
"session_id":"197861290",
"visitor_id":"65987046",


The dumps are truncate because the sensitive data.





> I have different numFound between json and xml parser
> -
>
> Key: SOLR-3810
> URL: https://issues.apache.org/jira/browse/SOLR-3810
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Affects Versions: 4.0-BETA
> Environment: Centos 6.0, kernel 2.6.18-238.9.1.el5 x86_64, tomcat 
> 7.0.27, solr 4.0-BETA
>Reporter: Yago Riveiro Rodríguez
>Priority: Critical
>
> This is weird, but I have different numFound between json and xml parser. 
> The query is a distributed search, same query only change the wt param.
> http://localhost:8080/solr4.0/4A_Stats201006/select?shards=localhost:8080/solr4.0/4A_Stats201006,localhost:8080/solr4.0/4A_Stats201007,localhost:8080/solr4.0/4A_Stats201008&indent=true&q=*:*
> 
> 
> 0
> 199
> 
> 
> 
> page697463104
> 697463104
> 197861290
> 65987046
> http://localhost:8080/solr4.0/4A_Stats201006/select?shards=localhost:8080/solr4.0/4A_Stats201006,localhost:8080/solr4.0/4A_Stats201007,localhost:8080/solr4.0/4A_Stats201008&indent=true&q=*:*&wt=json
> {
>   "responseHeader":{
>     "status":0,
> "QTime":169},
>   "response":{"numFound":29519009,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"page697463104",
> "surrogate_id":697463104,
> "session_id":"197861290",
> "visitor_id":"65987046",
> The dumps are truncate because the sensitive data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3810) I have different numFound between json and xml parser

2012-09-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451059#comment-13451059
 ] 

Yago Riveiro Rodríguez commented on SOLR-3810:
--

My deploy is very simple, 3 multicores, not solrCloud or similar. The data into 
the cores was inserted using a custom script.

Either way, if any document can't be parsed to json or something similar, the 
parser should warn in log. 

> I have different numFound between json and xml parser
> -
>
> Key: SOLR-3810
> URL: https://issues.apache.org/jira/browse/SOLR-3810
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Affects Versions: 4.0-BETA
> Environment: Centos 6.0, kernel 2.6.18-238.9.1.el5 x86_64, tomcat 
> 7.0.27, solr 4.0-BETA
>Reporter: Yago Riveiro Rodríguez
>Priority: Critical
>
> This is weird, but I have different numFound between json and xml parser. 
> The query is a distributed search, same query only change the wt param.
> http://localhost:8080/solr4.0/4A_Stats201006/select?shards=localhost:8080/solr4.0/4A_Stats201006,localhost:8080/solr4.0/4A_Stats201007,localhost:8080/solr4.0/4A_Stats201008&indent=true&q=*:*
> 
> 
> 0
> 199
> 
> 
> 
> page697463104
> 697463104
> 197861290
> 65987046
> http://localhost:8080/solr4.0/4A_Stats201006/select?shards=localhost:8080/solr4.0/4A_Stats201006,localhost:8080/solr4.0/4A_Stats201007,localhost:8080/solr4.0/4A_Stats201008&indent=true&q=*:*&wt=json
> {
>   "responseHeader":{
> "status":0,
> "QTime":169},
>   "response":{"numFound":29519009,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"page697463104",
> "surrogate_id":697463104,
> "session_id":"197861290",
> "visitor_id":"65987046",
> The dumps are truncate because the sensitive data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-3176) If replicationHandler has both and sections disabled (enable=false) it should disable itself

2012-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl closed SOLR-3176.
-

Resolution: Won't Fix

This will probably not be a problem with 4.x and beyond, closing

> If replicationHandler has both  and  sections disabled 
> (enable=false) it should disable itself
> -
>
> Key: SOLR-3176
> URL: https://issues.apache.org/jira/browse/SOLR-3176
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 3.5
>Reporter: Jan Høydahl
> Attachments: SOLR-3176.patch
>
>
> Today ReplicationHandler silently starts up, but when a slave tries to pull 
> indexversion the (wrongly configured) master answers "0" instead of not 
> answering at all which would be a better response.
> Also, it should log a warning that ReplicationHandler is enabled but has no 
> active master or slave section, and then disable the handler altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3711) Velocity: Break or truncate long strings in facet output

2012-09-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3711:
--

Fix Version/s: 5.0
 Assignee: Jan Høydahl

> Velocity: Break or truncate long strings in facet output
> 
>
> Key: SOLR-3711
> URL: https://issues.apache.org/jira/browse/SOLR-3711
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: /browse
> Fix For: 5.0
>
>
> In Solritas /browse GUI, if facets contain very long strings (such as 
> content-type tend to do), currently the too long text runs over the main 
> column and it is not pretty.
> Perhaps inserting a Soft Hyphen ­ 
> (http://en.wikipedia.org/wiki/Soft_hyphen) at position N in very long terms 
> is a solution?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >