from:"Peter Wolanin \(JIRA\)"

[jira] [Commented] (SOLR-4197) EDismax allows end users to use local params in q= to override global params

2012-12-16 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533611#comment-13533611
 ] 

Peter Wolanin commented on SOLR-4197:
-

Ok, but there is no way to enforce that in the configuration, right?

At the very least it's a documentation problem, but I would still consider it a 
problem that I can't lock this down via solrconfig.xml

 EDismax allows end users to use local params in q= to override global params
 

 Key: SOLR-4197
 URL: https://issues.apache.org/jira/browse/SOLR-4197
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5, 3.6, 4.0
Reporter: Peter Wolanin

 Edismax is advertised as suitable to be used to process advanced user input 
 directly.  Thus, it would seem reasonable to have an application directly 
 pass user input in the q= parameter to a back-end Solr server.
 However, it seems that users can enter local params at the start of q= which 
 override the global params that the application (e.g. website) may have set 
 on the query string.  Confirmed with Erik Hatcher that this is somewhat 
 unexpected behavior (though one could argue it's an expected feature of any 
 query parser)
 Proposed fix - add a parameter (e.g. that can be used as an invariant) that 
 can be passed to inhibit Solr from using local params from the q= parameter.
 This is somewhat related to SOLR-1687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4197) EDismax allows end users to use local params in q= to override global params

2012-12-16 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533643#comment-13533643
 ] 

Peter Wolanin commented on SOLR-4197:
-

Apparently adding a space at the beginning is not a complete solution - I then 
get an exception when it's the standard lucene parser:
{code}
Problem accessing /solr/select. Reason:

org.apache.lucene.queryParser.ParseException: Cannot parse ' 
{!lucene}hello': Encountered  } }  at line 1, column 9.
Was expecting one of:
TO ...
RANGEEX_QUOTED ...
RANGEEX_GOOP ...
{code}

 EDismax allows end users to use local params in q= to override global params
 

 Key: SOLR-4197
 URL: https://issues.apache.org/jira/browse/SOLR-4197
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5, 3.6, 4.0
Reporter: Peter Wolanin

 Edismax is advertised as suitable to be used to process advanced user input 
 directly.  Thus, it would seem reasonable to have an application directly 
 pass user input in the q= parameter to a back-end Solr server.
 However, it seems that users can enter local params at the start of q= which 
 override the global params that the application (e.g. website) may have set 
 on the query string.  Confirmed with Erik Hatcher that this is somewhat 
 unexpected behavior (though one could argue it's an expected feature of any 
 query parser)
 Proposed fix - add a parameter (e.g. that can be used as an invariant) that 
 can be passed to inhibit Solr from using local params from the q= parameter.
 This is somewhat related to SOLR-1687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4197) EDismax allows end users to use local params in q= to override global params

2012-12-15 Thread Peter Wolanin (JIRA)

Peter Wolanin created SOLR-4197:
---

 Summary: EDismax allows end users to use local params in q= to 
override global params
 Key: SOLR-4197
 URL: https://issues.apache.org/jira/browse/SOLR-4197
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0, 3.6, 3.5
Reporter: Peter Wolanin



Edismax is advertised as suitable to be used to process advanced user input 
directly.  Thus, it would seem reasonable to have an application directly pass 
user input in the q= parameter to a back-end Solr server.

However, it seems that users can enter local params at the start of q= which 
override the global params that the application (e.g. website) may have set on 
the query string.  Confirmed with Erik Hatcher that this is somewhat unexpected 
behavior (though one could argue it's an expected feature of any query parser)

Proposed fix - add a parameter (e.g. that can be used as an invariant) that can 
be passed to inhibit Solr from using local params from the q= parameter.

This is somewhat related to SOLR-1687


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4077) Solr params like zkHost are not consistently settable via JNDI - many only via system properties

2012-11-14 Thread Peter Wolanin (JIRA)

Peter Wolanin created SOLR-4077:
---

 Summary: Solr params like zkHost are not consistently settable via 
JNDI - many only via system properties
 Key: SOLR-4077
 URL: https://issues.apache.org/jira/browse/SOLR-4077
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Peter Wolanin
 Fix For: 4.0.1, 4.1, 5.0



The Solr home can be set via JNDI environment, and in general system properties 
should be used for configuring the container, not the application, since the 
container may run several web apps.

Let's add a helper method to something like SolrResourceLoader.java to look up 
values like zkHost (to find the zookeepers) or hostPort that can currently be 
in solr.xml OR in a system property, but not in e.g. a tomcat context file.

The helper would avoid then need to write code to try both options as currently 
exists in locateSolrHome()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2166) termvector component has strange syntax

2012-04-22 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2166:


Attachment: SOLR-2166.diff

 termvector component has strange syntax
 ---

 Key: SOLR-2166
 URL: https://issues.apache.org/jira/browse/SOLR-2166
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: SOLR-2166.diff


 The termvector  response format could really be improved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2166) termvector component has strange syntax

2012-04-22 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2166:


Attachment: SOLR-2166.diff

Here's a patch rolled against Solr 3.5 which I think makes the format into 
something more compact and that doesn't fail on JSON parsing.

BEFORE:
{code}
  termVectors:{
doc-49:{
  uniqueKey:evfbih/node/89,
  content:{
abba:{
  positions:{
position:49}},
abigo:{
  positions:{
position:5,
position:72}},
{code}


AFTER:

{code}
  termVectors:{
doc-49:{
  uniqueKey:evfbih/node/89,
  content:{
abba:{
  positions:[49]},
abigo:{
  positions:[5,
72]},
{code}

 termvector component has strange syntax
 ---

 Key: SOLR-2166
 URL: https://issues.apache.org/jira/browse/SOLR-2166
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: SOLR-2166.diff


 The termvector  response format could really be improved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2166) termvector component has strange syntax

2012-04-22 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2166:


Attachment: (was: SOLR-2166.diff)

 termvector component has strange syntax
 ---

 Key: SOLR-2166
 URL: https://issues.apache.org/jira/browse/SOLR-2166
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: SOLR-2166.diff


 The termvector  response format could really be improved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2166) termvector component has strange syntax

2012-04-22 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2166:


Attachment: workaround-managled-SOLR-2166.diff

As a work-around one could enabled access to the unused(?) 
writeNamedListAsMapMangled() function which prevents writing duplicate keys.  
For this work-around patch, use:

json.nl=mapm instead of json.nl=map

to see the behavior AFTER:
{code}
  termVectors:{
doc-49:{
  uniqueKey:evfbih/node/89,
  content:{
abba:{
  positions:{
position:49}},
abigo:{
  positions:{
position:5,
position_1:72}},
{code}

 termvector component has strange syntax
 ---

 Key: SOLR-2166
 URL: https://issues.apache.org/jira/browse/SOLR-2166
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: SOLR-2166.diff, workaround-managled-SOLR-2166.diff


 The termvector  response format could really be improved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings

2011-07-06 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2535:


Summary: REGRESSION: in Solr 3.x and trunk the admin/file handler fails to 
show directory listings  (was: In Solr 3.2 and trunk the admin/file handler 
fails to show directory listings)

 REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show 
 directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.4, 4.0

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-21 Thread Peter Wolanin (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052601#comment-13052601
]

Peter Wolanin commented on SOLR-2462:
-

I generated a patch for 3.2 looking at the commit on branch_3x. It looks
somewhat different from the last patch by James.

I also just compared the trunk commit to the last patch and it doesn't match
https://issues.apache.org/jira/secure/attachment/12481574/SOLR-2462.patch

Did the wrong patch get committed, or was the final patch just never get posted
to this issue before commit?

Using spellcheck.collate can result in extremely high memory usage
--

Key: SOLR-2462
URL: https://issues.apache.org/jira/browse/SOLR-2462
Project: Solr
Issue Type: Bug
Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
Fix For: 3.3, 4.0

Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch,
SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch,
SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch

When using spellcheck.collate, class SpellPossibilityIterator creates a
ranked list of *every* possible correction combination. But if returning
several corrections per term, and if several words are misspelled, the
existing algorithm uses a huge amount of memory.
This bug was introduced with SOLR-2010. However, it is triggered anytime
spellcheck.collate is used. It is not necessary to use any features that
were added with SOLR-2010.
We were in Production with Solr for 1 1/2 days and this bug started taking
our Solr servers down with infinite GC loops. It was pretty easy for this
to happen as occasionally a user will accidently paste the URL into the
Search box on our app. This URL results in a search with ~12 misspelled
words. We have spellcheck.count set to 15.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings

2011-06-18 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2535:


Attachment: SOLR-2535.patch

Here's the patch I used.  As before, it's just David's with the extra changes 
omitted.

 In Solr 3.2 and trunk the admin/file handler fails to show directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.3

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings

2011-06-11 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047999#comment-13047999
 ] 

Peter Wolanin commented on SOLR-2535:
-

Quick test works  - patched the 3.2 source and rebuilding the directory and 
subdirctory listings work as expected.

The patch I used is the same as David's but just re-rolled without the changes 
to SolrDisptatchFilter.java


I'm trying to attach it, but jira is throwing a stack trace.

 In Solr 3.2 and trunk the admin/file handler fails to show directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.3

 Attachments: 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2535) In Solr 3.1.0 the admin/file handler fails to show directory listings

2011-06-08 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046122#comment-13046122
 ] 

Peter Wolanin commented on SOLR-2535:
-

This ought to be a trivial fix, so I hope we can get it in 3.1.1, or is 3.3 
going to be the next minor version?

 In Solr 3.1.0 the admin/file handler fails to show directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.3


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings

2011-06-08 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2535:


Affects Version/s: 3.2
  Summary: In Solr 3.2 and trunk the admin/file handler fails to 
show directory listings  (was: In Solr 3.1.0 the admin/file handler fails to 
show directory listings)

 In Solr 3.2 and trunk the admin/file handler fails to show directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.3


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2535) In Solr 3.1.0 the admin/file handler fails to show directory listings

2011-06-02 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2535:


Fix Version/s: 4.0
  Description: 
In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing 
of the conf directory, like:
{noformat}
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/int/lst
lst name=files
  lst name=elevate.xmllong name=size1274/longdate 
name=modified2011-03-06T20:42:54Z/date/lst
  ...
/lst
/response
{noformat}

I can list the xslt sub-dir using solr/admin/files?file=/xslt


In Solr 3.1.0, both of these fail with a 500 error:
{noformat}
HTTP ERROR 500

Problem accessing /solr/admin/file/. Reason:

did not find a CONTENT object

java.io.IOException: did not find a CONTENT object
{noformat}

Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should 
still handle directory listings if not file name is given, or if the file is a 
directory, so I am filing this as a bug.


  was:
In Solr 4.1.1, going to the path solr/admin/file I see an XML-formatted listing 
of the conf directory, like:
{noformat}
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/int/lst
lst name=files
  lst name=elevate.xmllong name=size1274/longdate 
name=modified2011-03-06T20:42:54Z/date/lst
  ...
/lst
/response
{noformat}

I can list the xslt sub-dir using solr/admin/files?file=/xslt


In Solr 3.1.0, both of these fail with a 500 error:
{noformat}
HTTP ERROR 500

Problem accessing /solr/admin/file/. Reason:

did not find a CONTENT object

java.io.IOException: did not find a CONTENT object
{noformat}

Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should 
still handle directory listings if not file name is given, or if the file is a 
directory, so I am filing this as a bug.


Affects Version/s: 4.0

 In Solr 3.1.0 the admin/file handler fails to show directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.1.1, 3.2, 4.0


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2535) In Solr 3.1.0 the admin/file handler failes to show director listings

2011-05-21 Thread Peter Wolanin (JIRA)

In Solr 3.1.0 the admin/file handler failes to show director listings
-

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.1.1, 3.2



In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted 
listing of the conf directory, like:
{noformat}
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/int/lst
lst name=files
  lst name=elevate.xmllong name=size1274/longdate 
name=modified2011-03-06T20:42:54Z/date/lst
  ...
/lst
/response
{noformat}

I can list the xslt sub-dir using solr/admin/files?file=/xslt


In Solr 3.1.0, both of these fail with a 500 error:
{noformat}
HTTP ERROR 500

Problem accessing /solr/admin/file/. Reason:

did not find a CONTENT object

java.io.IOException: did not find a CONTENT object
{noformat}

Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should 
still handle directory listings if not file name is given, or if the file is a 
directory, so I am filing this as a bug.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2535) In Solr 3.1.0 the admin/file handler fails to show directory listings

2011-05-21 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2535:


Description: 
In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted 
listing of the conf directory, like:
{noformat}
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/int/lst
lst name=files
  lst name=elevate.xmllong name=size1274/longdate 
name=modified2011-03-06T20:42:54Z/date/lst
  ...
/lst
/response
{noformat}

I can list the xslt sub-dir using solr/admin/files?file=/xslt


In Solr 3.1.0, both of these fail with a 500 error:
{noformat}
HTTP ERROR 500

Problem accessing /solr/admin/file/. Reason:

did not find a CONTENT object

java.io.IOException: did not find a CONTENT object
{noformat}

Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should 
still handle directory listings if not file name is given, or if the file is a 
directory, so I am filing this as a bug.


  was:

In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted 
listing of the conf directory, like:
{noformat}
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/int/lst
lst name=files
  lst name=elevate.xmllong name=size1274/longdate 
name=modified2011-03-06T20:42:54Z/date/lst
  ...
/lst
/response
{noformat}

I can list the xslt sub-dir using solr/admin/files?file=/xslt


In Solr 3.1.0, both of these fail with a 500 error:
{noformat}
HTTP ERROR 500

Problem accessing /solr/admin/file/. Reason:

did not find a CONTENT object

java.io.IOException: did not find a CONTENT object
{noformat}

Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should 
still handle directory listings if not file name is given, or if the file is a 
directory, so I am filing this as a bug.


Summary: In Solr 3.1.0 the admin/file handler fails to show directory 
listings  (was: In Solr 3.1.0 the admin/file handler failes to show director 
listings)

 In Solr 3.1.0 the admin/file handler fails to show directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.1.1, 3.2


 In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2168) Velocity facet output for facet missing

2011-05-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034994#comment-13034994
 ] 

Peter Wolanin commented on SOLR-2168:
-

Did this change to the templates get committed to the actual Solr repo?

 Velocity facet output for facet missing
 ---

 Key: SOLR-2168
 URL: https://issues.apache.org/jira/browse/SOLR-2168
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 3.1
Reporter: Peter Wolanin
Priority: Minor
 Attachments: SOLR-2168.patch


 If I add fact.missing to the facet params for a field, the Veolcity output 
 has in the facet list:
 $facet.name (9220)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-232) let Solr set request headers (for logging)

2011-04-24 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024649#comment-13024649
 ] 

Peter Wolanin commented on SOLR-232:


Looks like the title needs to change?  From looking at the Solr 1.4 code, it 
seems this issue is now about setting RESPONSE headers?

That's certainly the use case I have in mind, and what seems to be commented 
out in the Solr 1.4 code:
https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/src/webapp/src/org/apache/solr/servlet/SolrDispatchFilter.java

{code}
  // add info to http headers
  //TODO: See SOLR-232 and SOLR-267.  
/*try {
  NamedList solrRspHeader = solrRsp.getResponseHeader();
 for (int i=0; isolrRspHeader.size(); i++) {
   ((javax.servlet.http.HttpServletResponse) 
response).addHeader((Solr- + solrRspHeader.getName(i)), 
String.valueOf(solrRspHeader.getVal(i)));
 }
} catch (ClassCastException cce) {
  log.log(Level.WARNING, exception adding response header log 
information, cce);
}*/
{code}

However, the things currently sent in the response header seem to be missing 
the # of matches (logged as hits), and I'm not sure I'd want all the params 
sent back as headers by default.

So, maybe we need a method like solrRsp.getHttpResponseHeader(); instead of 
using solrRsp.getResponseHeader();
 and corresponding setters?


 let Solr set request headers (for logging)
 --

 Key: SOLR-232
 URL: https://issues.apache.org/jira/browse/SOLR-232
 Project: Solr
  Issue Type: New Feature
 Environment: tomcat?
Reporter: Ian Holsman
Priority: Minor
 Attachments: meta.patch


 I need the ability to log certain information about a request so that I can 
 feed it into performance and capacity monitoring systems.
 I would like to know things like
 - how long the request took 
 - how many rows were fetched and returned
 - what handler was called.
 per request.
 the following patch is 1 way to implement this, I'm sure there are better 
 ways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-232) let Solr set request headers (for logging)

2011-04-24 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024651#comment-13024651
 ] 

Peter Wolanin commented on SOLR-232:


In addition, or instead, we could make it so that which elements from the 
responseHeader are set as http response headers is configurable in 
solrconfig.xml for each request handler?

 let Solr set request headers (for logging)
 --

 Key: SOLR-232
 URL: https://issues.apache.org/jira/browse/SOLR-232
 Project: Solr
  Issue Type: New Feature
 Environment: tomcat?
Reporter: Ian Holsman
Priority: Minor
 Attachments: meta.patch


 I need the ability to log certain information about a request so that I can 
 feed it into performance and capacity monitoring systems.
 I would like to know things like
 - how long the request took 
 - how many rows were fetched and returned
 - what handler was called.
 per request.
 the following patch is 1 way to implement this, I'm sure there are better 
 ways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-232) let Solr set request headers (for logging)

2011-04-24 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-232:
---

Attachment: SOLR-232.patch

Here's a patch against Solr 3.1 just as a proof of concept that adds the hits 
as a response header X-Solr-Hits

Apparently this code has been commented out so long that the log call and other 
things changed.

 let Solr set request headers (for logging)
 --

 Key: SOLR-232
 URL: https://issues.apache.org/jira/browse/SOLR-232
 Project: Solr
  Issue Type: New Feature
 Environment: tomcat?
Reporter: Ian Holsman
Priority: Minor
 Attachments: SOLR-232.patch, meta.patch


 I need the ability to log certain information about a request so that I can 
 feed it into performance and capacity monitoring systems.
 I would like to know things like
 - how long the request took 
 - how many rows were fetched and returned
 - what handler was called.
 per request.
 the following patch is 1 way to implement this, I'm sure there are better 
 ways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-232) let Solr set request headers (for logging)

2011-04-24 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-232:
---

Attachment: (was: SOLR-232.patch)

 let Solr set request headers (for logging)
 --

 Key: SOLR-232
 URL: https://issues.apache.org/jira/browse/SOLR-232
 Project: Solr
  Issue Type: New Feature
 Environment: tomcat?
Reporter: Ian Holsman
Priority: Minor
 Attachments: SOLR-232.patch, meta.patch


 I need the ability to log certain information about a request so that I can 
 feed it into performance and capacity monitoring systems.
 I would like to know things like
 - how long the request took 
 - how many rows were fetched and returned
 - what handler was called.
 per request.
 the following patch is 1 way to implement this, I'm sure there are better 
 ways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-232) let Solr set request headers (for logging)

2011-04-24 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-232:
---

Attachment: SOLR-232.patch

Hmm, that check isn't quite right - the ping handler ends up getting:

X-Solr-Hits: null

Since String.valuOf(Object object) has the behavior that if the argument is 
null, then a string equal to null;

This better POC patch checks for null in the right place. Deleted the old one.

 let Solr set request headers (for logging)
 --

 Key: SOLR-232
 URL: https://issues.apache.org/jira/browse/SOLR-232
 Project: Solr
  Issue Type: New Feature
 Environment: tomcat?
Reporter: Ian Holsman
Priority: Minor
 Attachments: SOLR-232.patch, meta.patch


 I need the ability to log certain information about a request so that I can 
 feed it into performance and capacity monitoring systems.
 I would like to know things like
 - how long the request took 
 - how many rows were fetched and returned
 - what handler was called.
 per request.
 the following patch is 1 way to implement this, I'm sure there are better 
 ways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2266) java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord()

2010-12-05 Thread Peter Wolanin (JIRA)

java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate 
field in a boost function with rord()


 Key: SOLR-2266
 URL: https://issues.apache.org/jira/browse/SOLR-2266
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1
 Environment: Mac OS 10.6
java version 1.6.0_22
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)

Reporter: Peter Wolanin



I have been testing a switch to long and tdate instead of int and date fields 
in the schema.xml for our Drupal integration.  This indexes fine, but search 
fails with a 500 error.

{code}
INFO: [d7] webapp=/solr path=/select 
params={spellcheck=truefacet=truefacet.mincount=1indent=1spellcheck.q=termjson.nl=mapwt=jsonrows=10version=1.2fl=id,entity_id,entity,bundle,bundle_name,nid,title,comment_count,type,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=termbf=recip(rord(created),4,19,19)^200.0}
 status=500 QTime=4 
Dec 5, 2010 11:52:28 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 39
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
at 
org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:61)
at 
org.apache.solr.search.function.TopValueSource.getValues(TopValueSource.java:57)
at 
org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:61)
at 
org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123)
at 
org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at com.acquia.search.HmacFilter.doFilter(HmacFilter.java:62)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
{code}

The exception goes away if I remove the boost function param 
bf=recip(rord(created),4,19,19)^200.0

Omitting the recip() doesn't help, so just bf=rord(created)^200.0 still causes 
the exception.

In

[jira] Issue Comment Edited: (SOLR-2168) Velocity facet output for facet missing

2010-10-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921852#action_12921852
 ] 

Peter Wolanin edited comment on SOLR-2168 at 10/17/10 10:25 AM:


Those all sound like good changes.

In terms of templating -I'd find something like erb, or PHP, or jsp much 
easier, and I imagine many more people are failiar with those.  So far, I feel 
like it's hard to understand in velocity how variables and control structures 
are distinguished from the output, and it's not clear that it's a real template 
in terms of the way e.g. white space is handled or not.  This is especially 
true in the case of macro output, where is seems like e.g. the carriage returns 
and spaces I'd naturally include in control structures to make them readable 
become part of the output.

The variable handling is also weird, that I need to use #set() for actual 
assignment?

In terms of readablilty, loo for example, at this bit:
{code}
lia href=#url_for_home#lensfq=$esc.url(
{code}

the fq= is output in the middle of a series of macro and function calls but 
nothing visually distinguishes them.  Can I define new functions instead of 
macros?  If a macro call could be written as #{url_for_home} it would provide 
more visual separation. 

I notice in the patch you have:
{code}
-${field.name}:[* TO *]
{code}

Looks like the function call can be written like this?
{code}
${esc.url(-${field.name}:[* TO *])}
{code}


  was (Author: pwolanin):

Those all sound like good changes.

In terms of templating -I'd find something like erb, or PHP, or jsp much 
easier, and I imagine many more people are failiar with those.  So far, I feel 
like it's hard to understand in velocity how variables and control structures 
are distinguished from the output, and it's not clear that it's a real template 
in terms of the way e.g. white space is handled or not.  This is especially try 
in the case of macro output, where is seems like e.g. the carriae returns and 
spaces I'd naturally include in control structures to make them readable become 
part of the output.

The variable handling is also weird, that I need to use #set() for actual 
assignment?

In terms of readablilty, loo for example, at this bit:
{code}
lia href=#url_for_home#lensfq=$esc.url(
{code}

the fq= is output in the middle of a series of macro and function calls but 
nothing visually distinguishes them.  Can I define new functions instead of 
macros?  If a macro call could be written as #{url_for_home} it would provide 
more visual separation. 

I notice in the patch you have:
{code}
-${field.name}:[* TO *]
{code}

Looks like the function call can be written like this?
{code}
${esc.url(-${field.name}:[* TO *])}
{code}

  
 Velocity facet output for facet missing
 ---

 Key: SOLR-2168
 URL: https://issues.apache.org/jira/browse/SOLR-2168
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 3.1
Reporter: Peter Wolanin
Priority: Minor
 Attachments: SOLR-2168.patch


 If I add fact.missing to the facet params for a field, the Veolcity output 
 has in the facet list:
 $facet.name (9220)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2168) Velocity facet output for facet missing

2010-10-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921855#action_12921855
 ] 

Peter Wolanin commented on SOLR-2168:
-


If you want to start using git more widely for devlopement (assuming people 
still post the final patches as attachments here) you might want to set up a 
canonical mirror some place on github so that everyone uses the same initial 
tree.  We have this for Drupal: http://github.com/drupal/drupal and mirroring 
out of svn is probably even easier if someone has a server and can just run a 
script on cron every ~15 min.

 Velocity facet output for facet missing
 ---

 Key: SOLR-2168
 URL: https://issues.apache.org/jira/browse/SOLR-2168
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 3.1
Reporter: Peter Wolanin
Priority: Minor
 Attachments: SOLR-2168.patch


 If I add fact.missing to the facet params for a field, the Veolcity output 
 has in the facet list:
 $facet.name (9220)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2168) Velocity facet output for facet missing

2010-10-16 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2168:


Attachment: SOLR-2168.patch

Attaching a function (if not elegant) fix - I find the velocity template syntax 
a little ... annoying

 Velocity facet output for facet missing
 ---

 Key: SOLR-2168
 URL: https://issues.apache.org/jira/browse/SOLR-2168
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 3.1
Reporter: Peter Wolanin
Priority: Minor
 Attachments: SOLR-2168.patch


 If I add fact.missing to the facet params for a field, the Veolcity output 
 has in the facet list:
 $facet.name (9220)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2149) Allow copyField directives to be controled by another (boolean) field

2010-10-10 Thread Peter Wolanin (JIRA)

Allow copyField directives to be controled by another (boolean) field
-

 Key: SOLR-2149
 URL: https://issues.apache.org/jira/browse/SOLR-2149
 Project: Solr
  Issue Type: New Feature
Reporter: Peter Wolanin



Thinking about alternative approaches to the problem outlined in SOLR-2010, it 
occurs to me that there are many cases where it would be useful to be able to 
control copyField behavior rather than having to fully populate or omit 
document fields.  In regards to spellcheck, I could then have a few different 
spellcheck indexes each built from a different field  and indicate for each 
document whether it's text should be added to each of the different spellcheck 
fields.

I'm imagining a general syntax like this:

{code}
 copyField source=body dest=teaser maxChars=300 
controlField=populate_teaser/
{code}

If not sure if Solr would could use the value of a control field only matches 
the ignored field type, but that's what I'm thinking about as one 
possibility.  In other words, I can pass index-time flags into the document 
that are reflected in the terms of what's indexed but not explicitly stored in 
the document.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2149) Allow copyField directives to be controled by another (boolean) field

2010-10-10 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2149:


Attachment: SOLR-2149.patch

The attached patch against 1.4 is not at all functional, just taking a rough 
look at where the code would need to be modified.

 Allow copyField directives to be controled by another (boolean) field
 -

 Key: SOLR-2149
 URL: https://issues.apache.org/jira/browse/SOLR-2149
 Project: Solr
  Issue Type: New Feature
Reporter: Peter Wolanin
 Attachments: SOLR-2149.patch


 Thinking about alternative approaches to the problem outlined in SOLR-2010, 
 it occurs to me that there are many cases where it would be useful to be able 
 to control copyField behavior rather than having to fully populate or omit 
 document fields.  In regards to spellcheck, I could then have a few different 
 spellcheck indexes each built from a different field  and indicate for each 
 document whether it's text should be added to each of the different 
 spellcheck fields.
 I'm imagining a general syntax like this:
 {code}
  copyField source=body dest=teaser maxChars=300 
 controlField=populate_teaser/
 {code}
 If not sure if Solr would could use the value of a control field only matches 
 the ignored field type, but that's what I'm thinking about as one 
 possibility.  In other words, I can pass index-time flags into the document 
 that are reflected in the terms of what's indexed but not explicitly stored 
 in the document.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1967) New Native PHP Response Writer Class

2010-09-01 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905186#action_12905186
 ] 

Peter Wolanin commented on SOLR-1967:
-

To my mind, the PHP response writer should just be removed.  PHP has had a 
number of security issues around unserializing data, and in most languages, 
unserializing potentially untrusted data may be an security vulnerability.


 New Native PHP Response Writer Class
 

 Key: SOLR-1967
 URL: https://issues.apache.org/jira/browse/SOLR-1967
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, Response Writers
Affects Versions: 1.4
Reporter: Israel Ekpo
 Fix For: 1.5, 3.1, 4.0, Next

 Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar

   Original Estimate: 0h
  Remaining Estimate: 0h

 Hi Solr users,
 If you are using Apache Solr via PHP, I have some good news for you.
 There is a new response writer for the PHP native extension, currently 
 available as a plugin.
 This new feature adds a new response writer class to the 
 org.apache.solr.request package.
 This class is used by the PHP Native Solr Client driver to prepare the query 
 response from Solr.
 This response writer allows you to configure the way the data is serialized 
 for the PHP client.
 You can use your own class name and you can also control how the properties 
 are serialized as well.
 The formatting of the response data is very similar to the way it is 
 currently done by the PECL extension on the client side.
 The only difference now is that this serialization is happening on the server 
 side instead.
 You will find this new response writer particularly useful when dealing with 
 responses for 
 - highlighting
 - admin threads responses
 - more like this responses
 to mention just a few
 You can pass the objectClassName request parameter to specify the class 
 name to be used for serializing objects. 
 Please note that the class must be available on the client side to avoid a 
 PHP_Incomplete_Object error during the unserialization process.
 You can also pass in the objectPropertiesStorageMode request parameter with 
 either a 0 (independent properties) or a 1 (combined properties).
 These parameters can also be passed as a named list when loading the response 
 writer in the solrconfig.xml file
 Having this control allows you to create custom objects which gives the 
 flexibility of implementing custom __get methods, ArrayAccess, Traversable 
 and Iterator interfaces on the PHP client side.
 Until this class in incorporated into Solr, you simply have to copy the jar 
 file containing this plugin into your lib directory under $SOLR_HOME
 The jar file is available here and so is the source code.
 Then set up the configuration as shown below and then restart your servelet 
 container
 Below is an example configuration in solrconfig.xml
 code
 queryResponseWriter name=phpnative 
 class=org.apache.solr.request.PHPNativeResponseWriter
 !-- You can choose a different class for your objects. Just make sure the 
 class is available in the client --
 str name=objectClassNameSolrObject/str
 !--
 0 means OBJECT_PROPERTIES_STORAGE_MODE_INDEPENDENT
 1 means OBJECT_PROPERTIES_STORAGE_MODE_COMBINED
 In independed mode, each property is a separate property
 In combined mode, all the properites are merged into a _properties array.
 The combined mode allows you to create custom __getters and you could also 
 implement ArrayAccess, Iterator and Traversable
 --
 int name=objectPropertiesStorageMode0/int
 /queryResponseWriter
 code
 Below is an example implementation on the PHP client side.
 Support for specifying custom response writers will be available starting 
 from the 0.9.11 version of the PECL extension for Solr currently available 
 here
 http://pecl.php.net/package/solr
 Here is an example of how to use the new response writer with the PHP client.
 code
 ?php
 class SolrClass
 {
 public $_properties = array();
 public function __get($property_name) {
 if (property_exists($this, $property_name)) { return $this-$property_name; } 
 else if (isset($_properties[$property_name])) { return 
 $_properties[$property_name]; }
 return null;
 }
 }
 $options = array
 (
 'hostname' = 'localhost',
 'port' = 8983,
 'path' = '/solr/'
 );
 $client = new SolrClient($options);
 $client-setResponseWriter(phpnative);
 $response = $client-ping();
 $query = new SolrQuery();
 $query-setQuery(:);
 $query-set(objectClassName, SolrClass);
 $query-set(objectPropertiesStorageMode, 1);
 $response = $client-query($query);
 $resp = $response-getResponse();
 ?
 code
 Documentation of the changes to the PECL extension are available here
 http://docs.php.net/manual/en/solrclient.construct.php
 http://docs.php.net/manual/en/solrclient.setresponsewriter.php
 Please contact me at

[jira] Commented: (SOLR-1819) Upgrade to Tika 0.7

2010-06-20 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880685#action_12880685
 ] 

Peter Wolanin commented on SOLR-1819:
-

As a side note, looks like Solr trunk is using a 0.8 snapshot of Tika

 Upgrade to Tika 0.7
 ---

 Key: SOLR-1819
 URL: https://issues.apache.org/jira/browse/SOLR-1819
 Project: Solr
  Issue Type: Improvement
Reporter: Tricia Williams
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: Next


 See title.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-05-30 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873461#action_12873461
 ] 

Peter Wolanin commented on SOLR-1852:
-


Yes, I'd propose to have this in 1.4.1 since it's a pretty serious bug in the 
places where it manifests.

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Robert Muir
 Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-05-25 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871123#action_12871123
 ] 

Peter Wolanin commented on SOLR-1852:
-

I'm thinking about 1.4 backporting - not sure what's happening with 1.5

Yes, you'd have to re-index if we have to backport to 1.4, but I assume that's 
only going to affect documents that would currently have broken searches?

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Robert Muir
 Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-05-24 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870624#action_12870624
 ] 

Peter Wolanin commented on SOLR-1852:
-

now this has been in trunk longer, do you feel any more confident about a back 
port?

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Robert Muir
 Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-31 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852233#action_12852233
 ] 

Peter Wolanin commented on SOLR-1852:
-

I'm confused by that comment - I thought this code is already in 1.5/trunk and 
the issue is backporting to the 1.4 branch?

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Robert Muir
 Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

enablePositionIncrements=true causes searches to fail when they are parse as 
phrase queries
-

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin


Symptom: searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.

According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Attachment: SOLR-1852.patch

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

 enablePositionIncrements=true causes searches to fail when they are parse 
 as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain
 name containing a '.', the Solr 1.4 analyzer tells me that I will get
 a match, but when I enter the search either in the client or directly
 in Solr, the search fails.  Our default handler is dismax, but this
 also fails with the standard handler.  So I'm wondering if this is a
 known issue, or am I missing something subtle in the analysis chain?
 Solr is 1.4.0 that I built.
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850608#action_12850608
 ] 

Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:41 PM:
---

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is 
treated as phrase query as if I had quoted it like Identi ca.  That phrase 
search also fails.  I had expected that Identi.ca would be the same as Identi 
ca (i.e. 2 separate tokens, not a phrase).

  was (Author: pwolanin):
This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.
  
 enablePositionIncrements=true causes searches to fail when they are parse 
 as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain
 name containing a '.', the Solr 1.4 analyzer tells me that I will get
 a match, but when I enter the search either in the client or directly
 in Solr, the search fails.  Our default handler is dismax, but this
 also fails with the standard handler.  So I'm wondering if this is a
 known issue, or am I missing something subtle in the analysis chain?
 Solr is 1.4.0 that I built.
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

would be great to have support for Identi.ca on the follow block

fails to match Identi.ca, but putting the content on its own or in another 
sentence:

Support Identi.ca

the search matches.  Testing suggests the word for is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the whitespace delimiter.

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


  was:
Symptom: searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.

According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


Summary: enablePositionIncrements=true can cause searches to fail 
when they are parsed as phrase queries  (was: enablePositionIncrements=true 
causes searches to fail when they are parse as phrase queries)

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the whitespace delimiter.
 Setting enablePositionIncrements=false in the stop

[jira] Updated: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

would be great to have support for Identi.ca on the follow block

fails to match Identi.ca, but putting the content on its own or in another 
sentence:

Support Identi.ca

the search matches.  Testing suggests the word for is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the word delimiter filter.

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory


  was:
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

would be great to have support for Identi.ca on the follow block

fails to match Identi.ca, but putting the content on its own or in another 
sentence:

Support Identi.ca

the search matches.  Testing suggests the word for is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the whitespace delimiter.

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor



 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false

[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850608#action_12850608
]

Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:52 PM:
---

This patch was created by Mark Miller - it's a back port of Solr trunk code
plus a tweak to let 1.4 compile

With this updated WordDelimiterFilter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is
treated as phrase query as if I had quoted it like Identi ca. That phrase
search also fails. I had expected that Identi.ca would be the same as Identi
ca (i.e. 2 separate tokens, not a phrase).

was (Author: pwolanin):
This patch was created by Mark Miller - it's a back port of Solr trunk code
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

enablePositionIncrements=true can cause searches to fail when they are
parsed as phrase queries
-

Key: SOLR-1852
URL: https://issues.apache.org/jira/browse/SOLR-1852
Project: Solr
Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
Attachments: SOLR-1852.patch

Symptom: searching for a string like a domain name containing a '.', the Solr
1.4 analyzer tells me that I will get a match, but when I enter the search
either in the client or directly in Solr, the search fails.
test string: Identi.ca
queries that fail: IdentiCa, Identi.ca, Identi-ca
query that matches: Identi ca
schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
Screen shots:
analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
Whether or not the bug appears is determined by the surrounding text:
would be great to have support for Identi.ca on the follow block
fails to match Identi.ca, but putting the content on its own or in another
sentence:
Support Identi.ca
the search matches. Testing suggests the word for is the problem, and it
looks like the bug occurs when a stop word preceeds a word that is split up
using the word delimiter filter.
Setting enablePositionIncrements=false in the stop filter and reindexing
causes the searches to match.
According to Mark Miller in #solr, this bug appears to be fixed already in
Solr trunk, either due to the upgraded lucene or changes to the
WordDelimiterFactory

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850610#action_12850610
 ] 

Peter Wolanin commented on SOLR-1852:
-

The changes in the patch originate at SOLR-1706 and SOLR-1657, however I don't 
think it's actually the same bug as SOLR-1706 intended to fix since the the 
admin analyzer interface the generated tokens look correct.

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1553) extended dismax query parser

2010-01-26 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805303#action_12805303
 ] 

Peter Wolanin commented on SOLR-1553:
-

some commented out debug code left in the committed parser?

{code}
protected void addClause(List clauses, int conj, int mods, Query q) {
//System.out.println(addClause:clauses=+clauses+ conj=+conj+ mods=+mods+ 
q=+q);
  super.addClause(clauses, conj, mods, q);
}
{code}

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 1.5

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (TIKA-338) Trying to use -encoding parameter alwyas results in an exception

2009-11-27 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin closed TIKA-338.
--

Resolution: Invalid

 Trying to use -encoding parameter alwyas results in an exception
 

 Key: TIKA-338
 URL: https://issues.apache.org/jira/browse/TIKA-338
 Project: Tika
  Issue Type: Bug
  Components: cli
Reporter: Peter Wolanin
 Fix For: 0.6

   Original Estimate: 1h
  Remaining Estimate: 1h

 There is a logical error in the CLI code - -encoding can never work and 
 always results in an exception
 $ java -jar tika-app/target/tika-app-0.6-SNAPSHOT.jar -encoding=UTF-8 -t 
 test.txt 
 Exception in thread main java.io.UnsupportedEncodingException: ncoding=UTF-8
   at sun.nio.cs.StreamEncoder.forOutputStreamWriter(StreamEncoder.java:42)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)

2009-11-27 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated TIKA-324:
---

Attachment: TIKA-324-README.patch

Here's a little follow-up patch for the README file to document this further.

 Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
 --

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4, 0.5
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Assignee: Jukka Zitting
Priority: Critical
 Fix For: 0.6

 Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, 
 TIKA-324-README.patch, TIKA-324.patch, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)

2009-11-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778926#action_12778926
 ] 

Peter Wolanin commented on TIKA-324:



In fact for tika 0.4 it looks like it works already to pass this option to java:


-Dfile.encoding=UTF8

$java -Dfile.encoding=UTF8 -jar orig-tika-app-0.4.jar -t ./test.txt
Iñtërnâtiônàlizætiøn


 Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
 --

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4, 0.5
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, 
 TIKA-324.patch, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)

2009-11-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778928#action_12778928
 ] 

Peter Wolanin commented on TIKA-324:


Also, this is not a Mac-only problem- I have the same issue, for example, on 
CentOS using java version 1.6.0_04

[r...@i:~] java -jar tika-app-0.4.jar -t test.txt 
I?t?rn?ti?n?liz?ti?n


 Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
 --

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4, 0.5
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, 
 TIKA-324.patch, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)

2009-11-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778951#action_12778951
 ] 

Peter Wolanin commented on TIKA-324:


on Mac OS 10.5 it looks correct:
$echo $LANG
en_US.UTF-8


on CentOS 5, no value is set:
echo $LANG


If I set that value on CenOS (to the same as my Mac) then output is correct:
[r...@i:~] export LANG=en_US.UTF-8
[r...@i:~] java -jar tika-app-0.4.jar -t test.txt 
Iñtërnâtiônàlizætiøn





 Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
 --

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4, 0.5
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, 
 TIKA-324.patch, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)

Tika CLI mangles utf-8 content in text (-t) mode


 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.4, 0.3
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5
 Attachments: test.txt


When using the -t flag to tika, multi-byte content is destroyed in the output.

Example:

{code}
$ java -jar tika-app-0.4.jar -t ./test.txt
I?t?rn?ti?n?liz?ti?n

$ java -jar tika-app-0.4.jar -x ./test.txt
?xml version=1.0 encoding=UTF-8?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/
/head
body
pIñtërnâtiônàlizætiøn
/p
/body
/html
{code}


see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated TIKA-324:
---

Attachment: test.txt

attaching little test ext file.

 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 {code}
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 {code}
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778133#action_12778133
 ] 

Peter Wolanin commented on TIKA-324:


Examining the TikaCLI.java code, the xhtml versus text output is handled very 
differently.  I'm not sure why the text one fails, but it seems to be easily 
rectified by applying the trasformer using text as the method.

 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 {code}
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 {code}
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated TIKA-324:
---

Attachment: TIKA-324.patch


Attached is a patch against Tika 0.4.  It resolves the bug for me, at least for 
the simple test case.

{code}
$ java -jar tika-app-0.4.jar -t ./test.txt 
Iñtërnâtiônàlizætiøn

$ java -jar tika-app-0.4.jar -x ./test.txt 
?xml version=1.0 encoding=UTF-8?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/
/head
body
pIñtërnâtiônàlizætiøn
/p
/body
/html
{code}



 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 {code}
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 {code}
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778134#action_12778134
 ] 

Peter Wolanin edited comment on TIKA-324 at 11/15/09 6:01 PM:
--


Attached is a patch against Tika 0.4.  It resolves the bug for me, at least for 
the simple test case.


$ java -jar tika-app-0.4.jar -t ./test.txt 
Iñtërnâtiônàlizætiøn

$ java -jar tika-app-0.4.jar -x ./test.txt 
?xml version=1.0 encoding=UTF-8?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/
/head
body
pIñtërnâtiônàlizætiøn
/p
/body
/html




  was (Author: pwolanin):

Attached is a patch against Tika 0.4.  It resolves the bug for me, at least for 
the simple test case.

{code}
$ java -jar tika-app-0.4.jar -t ./test.txt 
Iñtërnâtiônàlizætiøn

$ java -jar tika-app-0.4.jar -x ./test.txt 
?xml version=1.0 encoding=UTF-8?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/
/head
body
pIñtërnâtiônàlizætiøn
/p
/body
/html
{code}


  
 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 {code}
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 {code}
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778135#action_12778135
 ] 

Peter Wolanin commented on TIKA-324:


note:  test string origin is:  
http://intertwingly.net/stories/2004/04/14/i18n.html

 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 {code}
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 {code}
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated TIKA-324:
---

Description: 

When using the -t flag to tika, multi-byte content is destroyed in the output.

Example:


$ java -jar tika-app-0.4.jar -t ./test.txt
I?t?rn?ti?n?liz?ti?n

$ java -jar tika-app-0.4.jar -x ./test.txt
?xml version=1.0 encoding=UTF-8?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/
/head
body
pIñtërnâtiônàlizætiøn
/p
/body
/html



see also:  http://drupal.org/node/622508#comment-2267918

  was:

When using the -t flag to tika, multi-byte content is destroyed in the output.

Example:

{code}
$ java -jar tika-app-0.4.jar -t ./test.txt
I?t?rn?ti?n?liz?ti?n

$ java -jar tika-app-0.4.jar -x ./test.txt
?xml version=1.0 encoding=UTF-8?
html xmlns=http://www.w3.org/1999/xhtml;
head
title/
/head
body
pIñtërnâtiônàlizætiøn
/p
/body
/html
{code}


see also:  http://drupal.org/node/622508#comment-2267918


The bug is confirmed as present in 0.3 also.

 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778140#action_12778140
 ] 

Peter Wolanin edited comment on TIKA-324 at 11/15/09 6:20 PM:
--

The bug is confirmed as present in 0.3 also.

$ java -jar tika-0.3.jar -t ./test.txt 
I?t?rn?ti?n?liz?ti?n


  was (Author: pwolanin):
The bug is confirmed as present in 0.3 also.
  
 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778148#action_12778148
 ] 

Peter Wolanin commented on TIKA-324:


Bug is still present in trunk (and code tagged for 0.5)


$ java -jar tika-app/target/tika-app-0.6-SNAPSHOT.jar -t ./test.txt
I?t?rn?ti?n?liz?ti?n



 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt, TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode

2009-11-15 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated TIKA-324:
---

Attachment: TIKA-324.patch
TIKA-324-0.5.patch


Here is a patch for tika 0.5/trunk that resolves the bug (1 line change) and a 
revised patch for 0.4 that sets indent to true for consistency.

For a quick test PDF - look at:  http://nlp.stanford.edu/IR-book/pdf/00front.pdf

Without the patch, the math symbols like ω,ωk are obliterated.

 Tika CLI mangles utf-8 content in text (-t) mode
 

 Key: TIKA-324
 URL: https://issues.apache.org/jira/browse/TIKA-324
 Project: Tika
  Issue Type: Bug
  Components: cli
Affects Versions: 0.3, 0.4
 Environment: Mac OS 10.5, java version 1.6.0_15
Reporter: Peter Wolanin
Priority: Critical
 Fix For: 0.5

 Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324.patch, 
 TIKA-324.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 When using the -t flag to tika, multi-byte content is destroyed in the output.
 Example:
 $ java -jar tika-app-0.4.jar -t ./test.txt
 I?t?rn?ti?n?liz?ti?n
 $ java -jar tika-app-0.4.jar -x ./test.txt
 ?xml version=1.0 encoding=UTF-8?
 html xmlns=http://www.w3.org/1999/xhtml;
 head
 title/
 /head
 body
 pIñtërnâtiônàlizætiøn
 /p
 /body
 /html
 see also:  http://drupal.org/node/622508#comment-2267918

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-10-30 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771932#action_12771932
 ] 

Peter Wolanin commented on SOLR-874:


Anyone have an approach for this bug so we can get it fixed before 1.4 is done?

 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher
 Attachments: SOLR-874.patch


 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-08 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752468#action_12752468
 ] 

Peter Wolanin commented on SOLR-1400:
-


these lines seems to vary as to whether there is WS between char and the []

{code}
@@ -29,29 +30,48 @@
 public class TestTrimFilter extends BaseTokenTestCase {
   
   public void testTrim() throws Exception {
+char[] a =  a .toCharArray();
+char [] b = b   .toCharArray();
+char [] ccc = cCc.toCharArray();
+char[] whitespace =.toCharArray();
+char[] empty = .toCharArray();
{code}

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: SOLR-1400.patch, trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-07 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752245#action_12752245
 ] 

Peter Wolanin commented on SOLR-1400:
-

The patch seems to fix the bug for me, but there seems to be some code style 
inconsistency in the test code.

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: SOLR-1400.patch, trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-756) Make DisjunctionMaxQueryParser generally useful by supporting all query types.

2009-09-03 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751038#action_12751038
 ] 

Peter Wolanin commented on SOLR-756:


We are regularly hitting this wall and users are very frustrated by not being 
able to use wildcards becuase we wanted the other advantages of the dismax 
parser.

Any chance to get some of these changes in 1.4?

 Make DisjunctionMaxQueryParser generally useful by supporting all query types.
 --

 Key: SOLR-756
 URL: https://issues.apache.org/jira/browse/SOLR-756
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: 1.5

 Attachments: SolrPluginUtilsDisMax.patch


 This is an enhancement to the DisjunctionMaxQueryParser to work on all the 
 query variants such as wildcard, prefix, and fuzzy queries, and to support 
 working in AND scenarios that are not processed by the min-should-match 
 DisMax QParser. This was not in Solr already because DisMax was only used for 
 a very limited syntax that didn't use those features. In my opinion, this 
 makes a more suitable base parser for general use because unlike the 
 Lucene/Solr parser, this one supports multiple default fields whereas other 
 ones (say Yonik's {!prefix} one for example, can't do dismax). The notion of 
 a single default field is antiquated and a technical under-the-hood detail of 
 Lucene that I think Solr should shield the user from by on-the-fly using a 
 DisMax when multiple fields are used. 
 (patch to be attached soon)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-01 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1400:


Attachment: trim-example.xml


Post the attached document using the trunk sample schema.xml to reproduce.

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-08-03 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1274:


Attachment: SOLR-1274.patch

Here's a patch that's nearly there, but somehow I'm missing something in how 
java behaves.  The param is getting picked up, but this line never evals as 
true, even when the param is parsed right:

{code}
  if (extractFormat == text) {
{code}


If I set it to
{code}
  if (true) {
{code}

I get the desired text-only output.

 Provide multiple output formats in extract-only mode for tika handler
 -

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1274.patch


 The proposed feature is to accept a URL parameter when using extract-only 
 mode to specify an output format.  This parameter might just overload the 
 existing ext.extract.only so that one can optionally specify a format, e.g. 
 false|true|xml|text  where true and xml give the same response (i.e. xml 
 remains the default)
 I had been assuming that I could choose among possible tika output
 formats when using the extracting request handler in extract-only mode
 as if from the CLI with the tika jar:
-x or --xmlOutput XHTML content (default)
-h or --html   Output HTML content
-t or --text   Output plain text content
-m or --metadata   Output only metadata
 However, looking at the docs and source, it seems that only the xml
 option is available (hard-coded) in ExtractingDocumentLoader.java
 {code}
 serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, 
 true));
 {code}
 Providing at least a plain-text response seems to work if you change the 
 serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-08-03 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1274:


Attachment: SOLR-1274.patch

Well, indeed - something like that works better.


 Provide multiple output formats in extract-only mode for tika handler
 -

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1274.patch, SOLR-1274.patch


 The proposed feature is to accept a URL parameter when using extract-only 
 mode to specify an output format.  This parameter might just overload the 
 existing ext.extract.only so that one can optionally specify a format, e.g. 
 false|true|xml|text  where true and xml give the same response (i.e. xml 
 remains the default)
 I had been assuming that I could choose among possible tika output
 formats when using the extracting request handler in extract-only mode
 as if from the CLI with the tika jar:
-x or --xmlOutput XHTML content (default)
-h or --html   Output HTML content
-t or --text   Output plain text content
-m or --metadata   Output only metadata
 However, looking at the docs and source, it seems that only the xml
 option is available (hard-coded) in ExtractingDocumentLoader.java
 {code}
 serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, 
 true));
 {code}
 Providing at least a plain-text response seems to work if you change the 
 serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-07-15 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731437#action_12731437
 ] 

Peter Wolanin commented on SOLR-1274:
-

A minimal version of this would be pretty trivial as far as features go, and 
I'd thought Yonik was indicating on the e-mail list that it would be a 
reasonable follow on to his last patch in the linked issue.

 Provide multiple output formats in extract-only mode for tika handler
 -

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4


 The proposed feature is to accept a URL parameter when using extract-only 
 mode to specify an output format.  This parameter might just overload the 
 existing ext.extract.only so that one can optionally specify a format, e.g. 
 false|true|xml|text  where true and xml give the same response (i.e. xml 
 remains the default)
 I had been assuming that I could choose among possible tika output
 formats when using the extracting request handler in extract-only mode
 as if from the CLI with the tika jar:
-x or --xmlOutput XHTML content (default)
-h or --html   Output HTML content
-t or --text   Output plain text content
-m or --metadata   Output only metadata
 However, looking at the docs and source, it seems that only the xml
 option is available (hard-coded) in ExtractingDocumentLoader.java
 {code}
 serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, 
 true));
 {code}
 Providing at least a plain-text response seems to work if you change the 
 serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-14 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-874:
---

Attachment: SOLR-874.patch

Here's a simple patch that escapes with a \.  It prevents the exception, 
however, this fails to match and/or/not (after removing those from the 
stopwords file) so it's clearly not quite right.



 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher
 Attachments: SOLR-874.patch


 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-07-13 Thread Peter Wolanin (JIRA)

Provide multiple output formats in extract-only mode for tika handler
-

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4


The proposed feature is to accept a URL parameter when using extract-only mode 
to specify an output format.  This parameter might just overload the existing 
ext.extract.only so that one can optionally specify a format, e.g. 
false|true|xml|text  where true and xml give the same response (i.e. xml 
remains the default)

I had been assuming that I could choose among possible tika output
formats when using the extracting request handler in extract-only mode
as if from the CLI with the tika jar:

   -x or --xmlOutput XHTML content (default)
   -h or --html   Output HTML content
   -t or --text   Output plain text content
   -m or --metadata   Output only metadata

However, looking at the docs and source, it seems that only the xml
option is available (hard-coded) in ExtractingDocumentLoader.java
{code}
serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true));
{code}

Providing at least a plain-text response seems to work if you change the 
serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-13 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730492#action_12730492
 ] 

Peter Wolanin commented on SOLR-874:


I get the same sort of exception with a *leading* operator and the dismax 
handler.


Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR 
vti OR aut OR author OR dll': Encountered  OR OR  at line
1, column 0.
Was expecting one of:
   NOT ...
   + ...
   - ...
   ( ...
   * ...
   QUOTED ...
   TERM ...
   PREFIXTERM ...
   WILDTERM ...
   [ ...
   { ...
   NUMBER ...
   TERM ...
   * ...

   at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher

 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-13 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730513#action_12730513
 ] 

Peter Wolanin commented on SOLR-874:


possibly a fix could be rolled into this existing method in 
SolrPluginUtils.java ?

{code}
  /**
   * Strips operators that are used illegally, otherwise reuturns it's
   * input.  Some examples of illegal user queries are: chocolate +-
   * chip, chocolate - - chip, and chocolate chip -.
   */
  public static CharSequence stripIllegalOperators(CharSequence s) {
String temp = CONSECUTIVE_OP_PATTERN.matcher( s ).replaceAll(   );
return DANGLING_OP_PATTERN.matcher( temp ).replaceAll(  );
  }
{code}

This seems only to be called from:

org/apache/solr/search/DisMaxQParser.java:156:  userQuery = 
SolrPluginUtils.stripIllegalOperators(userQuery).toString();

 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher

 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1200) NullPointerException when unloading an absent core

2009-06-04 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716252#action_12716252
 ] 

Peter Wolanin commented on SOLR-1200:
-

Do we need to open another issue (maybe for 1.5) - I'd think the expected 
behavior would be to throw a specific exception anywhere in core admin that a 
core is not found, and then catch it and return a 404?

At the moment, however, you can request status for a non-existent core, etc, 
and get a 200 with some data, so this patch makes the behavior consistent, at 
least.

 NullPointerException when unloading an absent core
 --

 Key: SOLR-1200
 URL: https://issues.apache.org/jira/browse/SOLR-1200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: java version 1.6.0_07
Reporter: Peter Wolanin
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1200.patch, SOLR-1200.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When I try to unload a core that does not exist (e.g. it has already been 
 unloaded), Solr throws a NullPointerException
 java.lang.NullPointerException
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319)
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1200) NullPointerException when unloading an absent core

2009-06-03 Thread Peter Wolanin (JIRA)

NullPointerException when unloading an absent core
--

 Key: SOLR-1200
 URL: https://issues.apache.org/jira/browse/SOLR-1200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: java version 1.6.0_07
Reporter: Peter Wolanin
Priority: Minor



When I try to unload a core that does not exist (e.g. it has already been 
unloaded), Solr throws a NullPointerException

java.lang.NullPointerException
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319)
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1200) NullPointerException when unloading an absent core

2009-06-03 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1200:


Attachment: SOLR-1200.patch

Here's a simple patch that follows the pattern in the other core admin methods.

 NullPointerException when unloading an absent core
 --

 Key: SOLR-1200
 URL: https://issues.apache.org/jira/browse/SOLR-1200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: java version 1.6.0_07
Reporter: Peter Wolanin
Priority: Minor
 Attachments: SOLR-1200.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When I try to unload a core that does not exist (e.g. it has already been 
 unloaded), Solr throws a NullPointerException
 java.lang.NullPointerException
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319)
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1183) Example script not update for new analysis path from SOLR-1099

2009-05-24 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1183:


Attachment: SOLR-1183.patch

 Example script not update for new analysis path from SOLR-1099
 --

 Key: SOLR-1183
 URL: https://issues.apache.org/jira/browse/SOLR-1183
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1183.patch


 The example script example/exampleAnalysis/post.sh attempts to post to the 
 path http://localhost:8983/solr/analysis
  however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
 default as of r767412
 A simple fix is to change to http://localhost:8983/solr/analysis/document

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1183) Example script not updated for new analysis path from SOLR-1099

2009-05-24 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1183:


Description: 

The example script example/exampleAnalysis/post.sh attempts to post to the path 
http://localhost:8983/solr/analysis
 however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
default as of r767412

A simple fix is to change to http://localhost:8983/solr/analysis/document

  was:


The example script example/exampleAnalysis/post.sh attempts to post to the path 
http://localhost:8983/solr/analysis
 however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
default as of r767412

A simple fix is to change to http://localhost:8983/solr/analysis/document

Summary: Example script not updated for new analysis path from 
SOLR-1099  (was: Example script not update for new analysis path from SOLR-1099)

 Example script not updated for new analysis path from SOLR-1099
 ---

 Key: SOLR-1183
 URL: https://issues.apache.org/jira/browse/SOLR-1183
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1183.patch


 The example script example/exampleAnalysis/post.sh attempts to post to the 
 path http://localhost:8983/solr/analysis
  however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
 default as of r767412
 A simple fix is to change to http://localhost:8983/solr/analysis/document

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1167) Support module xml config files using XInclude

2009-05-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710200#action_12710200
 ] 

Peter Wolanin commented on SOLR-1167:
-

I think you posted a sample snippet for solrconfig to the list - can you report 
here and possibly include in the patch a change to the sample schema or 
solrconfig that would demonstrate this feature?

 Support module xml config files using XInclude
 --

 Key: SOLR-1167
 URL: https://issues.apache.org/jira/browse/SOLR-1167
 Project: Solr
  Issue Type: New Feature
Reporter: Bryan Talbot
Priority: Minor
 Attachments: SOLR-1167.patch


 Current configuration files (schema and solrconfig) are monolithic which can 
 make maintenance and reuse more difficult that it needs to be.  The XML 
 standards include a feature to include content from external files.  This is 
 described at http://www.w3.org/TR/xinclude/
 This feature is to add support for XInclude features for XML configuration 
 files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1151) Document the new CopyField maxChars property in the example schema.xml

2009-05-08 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1151:


Description: In this issue:  http://issues.apache.org/jira/browse/SOLR-538  
a maxLength property was added to the copyField directive.  However, this is 
not documented in the example schema to make the feature known to users.  (was: 
In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
property was added to the copyField directive.  However, this is not documented 
in the example schema to make the feature known to users.)
Summary: Document the new CopyField maxChars property in the example 
schema.xml  (was: Document the new CopyField maxLength property in the example 
schema.xml)

 Document the new CopyField maxChars property in the example schema.xml
 --

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1151) Document the new CopyField maxChars property in the example schema.xml

2009-05-08 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1151:


Attachment: SOLR-1151.patch

revised patch to use maxChars - still not sure if this is a useful example, but 
at least adds some documentation of this property.


 Document the new CopyField maxChars property in the example schema.xml
 --

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch, SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml

2009-05-07 Thread Peter Wolanin (JIRA)

Document the new CopyField maxLength property in the example schema.xml
---

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4



In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
property was added to the copyField directive.  However, this is not documented 
in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml

2009-05-07 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1151:


Attachment: SOLR-1151.patch

1st pass


 Document the new CopyField maxLength property in the example schema.xml
 ---

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml

2009-05-07 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707211#action_12707211
 ] 

Peter Wolanin commented on SOLR-1151:
-

needs work - the final format is maxChars NOT maxLength

 Document the new CopyField maxLength property in the example schema.xml
 ---

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-341) PHP Solr Client

2009-03-13 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12681893#action_12681893
 ] 

Peter Wolanin commented on SOLR-341:


r6 has been bundled into a release:  
http://code.google.com/p/solr-php-client/downloads/list

We'll test this with the Drupal module soon, but is likely to work fine.

 PHP Solr Client
 ---

 Key: SOLR-341
 URL: https://issues.apache.org/jira/browse/SOLR-341
 Project: Solr
  Issue Type: New Feature
  Components: clients - php
Affects Versions: 1.2
 Environment: PHP = 5.2.0 (or older with JSON PECL extension or other 
 json_decode function implementation). Solr = 1.2
Reporter: Donovan Jimenez
Priority: Trivial
 Fix For: 1.5

 Attachments: SolrPhpClient.2008-09-02.zip, 
 SolrPhpClient.2008-11-14.zip, SolrPhpClient.2008-11-25.zip, SolrPhpClient.zip


 Developed this client when the example PHP source didn't meet our needs.  The 
 company I work for agreed to release it under the terms of the Apache License.
 This version is slightly different from what I originally linked to on the 
 dev mailing list.  I've incorporated feedback from Yonik and hossman to 
 simplify the client and only accept one response format (JSON currently).
 When Solr 1.3 is released the client can be updated to use the PHP or 
 Serialized PHP response writer.
 example usage from my original mailing list post:
 ?php
 require_once('Solr/Service.php');
 $start = microtime(true);
 $solr = new Solr_Service(); //Or explicitly new Solr_Service('localhost', 
 8180, '/solr');
 try
 {
 $response = $solr-search('solr', 0, 10,
 array(/* you can include other parameters here */));
 echo 'search returned with status = ', 
 $response-responseHeader-status,
 ' and took ', microtime(true) - $start, ' seconds', \n;
 //here's how you would access results
 //Notice that I've mapped the values by name into a tree of stdClass 
 objects
 //and arrays (actually, most of this is done by json_decode )
 if ($response-response-numFound  0)
 {
 $doc_number = $response-response-start;
 foreach ($response-response-docs as $doc)
 {
 $doc_number++;
 echo $doc_number, ': ', $doc-text, \n;
 }
 }
 //for the purposes of seeing the available structure of the response
 //NOTE: Solr_Response::_parsedData is lazy loaded, so a print_r on 
 the response before
 //any values are accessed may result in different behavior (in case
 //anyone has some troubles debugging)
 //print_r($response);
 }
 catch (Exception $e)
 {
 echo $e-getMessage(), \n;
 }
 ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-196) A PHP response writer for Solr

2009-03-09 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288
 ] 

Peter Wolanin commented on SOLR-196:


This serialized writer produces output that is inconsistent with the other PHP 
writer adn inconsistent with the JSON

 A PHP response writer for Solr
 --

 Key: SOLR-196
 URL: https://issues.apache.org/jira/browse/SOLR-196
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, search
Reporter: Paul Borgermans
 Fix For: 1.3

 Attachments: SOLR-192-php-responsewriter.patch, 
 SOLR-196-PHPResponseWriter.patch


 It would be useful to have a PHP response writer that returns an array to be 
 eval-ed directly. This is especially true for PHP4.x installs, where there is 
 no built in support for JSON.
 This issue attempts to address this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-196) A PHP response writer for Solr

2009-03-09 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288
 ] 

Peter Wolanin edited comment on SOLR-196 at 3/9/09 2:33 PM:


This serialized writer produces output that is inconsistent with the other PHP 
writer and inconsistent with the JSON.

  was (Author: pwolanin):
This serialized writer produces output that is inconsistent with the other 
PHP writer adn inconsistent with the JSON
  
 A PHP response writer for Solr
 --

 Key: SOLR-196
 URL: https://issues.apache.org/jira/browse/SOLR-196
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, search
Reporter: Paul Borgermans
 Fix For: 1.3

 Attachments: SOLR-192-php-responsewriter.patch, 
 SOLR-196-PHPResponseWriter.patch


 It would be useful to have a PHP response writer that returns an array to be 
 eval-ed directly. This is especially true for PHP4.x installs, where there is 
 no built in support for JSON.
 This issue attempts to address this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-196) A PHP response writer for Solr

2009-03-09 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288
 ] 

Peter Wolanin edited comment on SOLR-196 at 3/9/09 4:39 PM:


This PHP writer is inconsistent with the JSON if you use php 5's decode_json, 
maps come back as objects.

  was (Author: pwolanin):
This serialized writer produces output that is inconsistent with the other 
PHP writer and inconsistent with the JSON.
  
 A PHP response writer for Solr
 --

 Key: SOLR-196
 URL: https://issues.apache.org/jira/browse/SOLR-196
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, search
Reporter: Paul Borgermans
 Fix For: 1.3

 Attachments: SOLR-192-php-responsewriter.patch, 
 SOLR-196-PHPResponseWriter.patch


 It would be useful to have a PHP response writer that returns an array to be 
 eval-ed directly. This is especially true for PHP4.x installs, where there is 
 no built in support for JSON.
 This issue attempts to address this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677517#action_12677517
 ] 

Peter Wolanin commented on LUCENE-1500:
---

Well, this patch does not (obviously) solve the real bug.  Is it possible to 
combine #1 and #3, but possibly revert #3 later when we solve the real bug in 
the highlighter code?  

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677531#action_12677531
 ] 

Peter Wolanin commented on LUCENE-1500:
---

The bug we are seeing now happens on pretty much every document that contains 
multi-byte characters, but only sometime was it going past the end of the full 
string and hitting the exception.   With the patch, the bug is still very 
evident, it jsut prevents the exception.  I's a serious flaw in the highlighter 
- maybe using some only non-utf-8 aware method to calculate string lengths?

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677540#action_12677540
 ] 

Peter Wolanin commented on LUCENE-1500:
---

I am using Solr, but with a single value field.  I'm using the current Solr 
build (includes the fix), so the bug I'm describing, which triggers the same 
exception as the prior Solr bug did, is still present and unrelated to SOLR-925.

The extent of my tracing suggests it's coming when the token stream is 
generated, which looks to be part of the lucene highlighter:  
org.apache.lucene.search.highlight.TokenSources



 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677561#action_12677561
 ] 

Peter Wolanin commented on LUCENE-1500:
---

I'm still trying to get a handle on how these pieces fit together., so sorry if 
I've jumped to the wrong conclusion.  If the analyzer is where the offsets are 
calculated, then that sounds like the place to look.

The field does use term vectors.  The field uses this type from the Solr schema:

{code}
fieldType name=text class=solr.TextField positionIncrementGap=100
{code}

The full schema is
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.30pathrev=DRUPAL-6--1

the field is
{code}
field name=body type=text indexed=true stored=true termVectors=true/
{code}

in case it's relevant, the solrconfig is:
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.15pathrev=DRUPAL-6--1

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677620#action_12677620
 ] 

Peter Wolanin commented on LUCENE-1500:
---

Ah, it occurs to me that we first saw this bug recently - and it seems likely 
it was only after starting to use :
{code}
 charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/
{code}

for that field type.  I will investigate more and post a SOLR issue.

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677629#action_12677629
 ] 

Peter Wolanin commented on LUCENE-1500:
---

Koji - thanks - I was aware that not all worked with the mapping filter, but I 
was apparently misinformed since I was told that the 
solr.HTMLStripWhitespaceTokenizerFactory was also suitable for CharFilter.  
Indeed your e-mail thread linked from SOLR-822 describes exactly the problem I 
have:

bq. As you can see, if you use CharFilter, Token offsets could be incorrect 
because CharFilters may convert 1 char to 2 chars or the other way around.

In the thread you suggest that this API could be aded to lucene java?

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer

2009-02-27 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677627#action_12677627
 ] 

Peter Wolanin commented on SOLR-822:


Is there an issue for CharStream API  in lucene?  The e-mail thread looks like 
people were generally in support.

 CharFilter - normalize characters before tokenizer
 --

 Key: SOLR-822
 URL: https://issues.apache.org/jira/browse/SOLR-822
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.4

 Attachments: character-normalization.JPG, sample_mapping_ja.txt, 
 sample_mapping_ja.txt, SOLR-822-for-1.3.patch, SOLR-822.patch, 
 SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch


 A new plugin which can be placed in front of tokenizer/.
 {code:xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping_ja.txt /
 tokenizer class=solr.MappingCJKTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}
 charFilter/ can be multiple (chained). I'll post a JPEG file to show 
 character normalization sample soon.
 MOTIVATION:
 In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and 
 Morphological Analyzer.
 When we use morphological analyzer, because the analyzer uses Japanese 
 dictionary to detect terms,
 we need to normalize characters before tokenization.
 I'll post a patch soon, too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-25 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676648#action_12676648
 ] 

Peter Wolanin commented on LUCENE-1500:
---

Yes - this patch is not a fix - but a work-around.

The root cause is clearly somewhere in the code generating the token stream - 
tokens seem to be getting positions in bytes rather than characters.

DefaultSolrHighlighter.java has this code:



{code}
import org.apache.lucene.search.highlight.TokenSources;

...

// create TokenStream
try {
  // attempt term vectors
  if( tots == null )
tots = new TermOffsetsTokenStream( 
TokenSources.getTokenStream(searcher.getReader(), docId, fieldName) );
  tstream = tots.getMultiValuedTokenStream( docTexts[j].length() );
}
catch (IllegalArgumentException e) {
  // fall back to anaylzer
  tstream = new 
TokenOrderingFilter(schema.getAnalyzer().tokenStream(fieldName, new 
StringReader(docTexts[j])), 10);
}
{code}



 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-24 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676421#action_12676421
 ] 

Peter Wolanin commented on LUCENE-1500:
---

I have run into this issue over the last couple days.  Also using Solr, but the 
error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere lucene is counting bytes instead of characters, so each 
substring the highlighter tries to select is offset further forward in the 
string being matched.

here's an example trying to highlight the string 'Drupaltalk' with strong tags
{code}
p class=search-snippet
 Community ist - und dieses Portal Drstrongupaltalk.d/stronge samt seinem 
schon eifrigen Benutzer- und Gästezulauf ( ... 
 nter Drustrongpaltalk001/strong könnt Ihr die erste Konferenz noch mal 
nachhören und erfahren, wie Selbstorganisation in der Drupal Szene 
funktioniert.  
 Drustrongpaltalk002/strong ist dann der Talk vom Dienstag zum Thema 
Drupal Al/p
{code}



 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
 Attachments: patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-24 Thread Peter Wolanin (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676421#action_12676421
]

pwolanin edited comment on LUCENE-1500 at 2/24/09 1:38 PM:

I have run into this issue over the last couple days. Also using Solr, but the
error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere Lucene is counting bytes instead of characters, so each
substring the highlighter tries to select is offset further forward in the
string being matched.

here's an example trying to highlight the string 'Drupaltalk' with strong tags
{code}
p class=search-snippet
Community ist - und dieses Portal Drstrongupaltalk.d/stronge samt seinem
schon eifrigen Benutzer- und Gästezulauf ( ...
nter Drustrongpaltalk001/strong könnt Ihr die erste Konferenz noch mal
nachhören und erfahren, wie Selbstorganisation in der Drupal Szene
funktioniert.
Drustrongpaltalk002/strong ist dann der Talk vom Dienstag zum Thema
Drupal Al/p
{code}

So the attached patch would probably avoid the exception (and is a good idea)
but would not fix the bug I'm seeing.

was (Author: pwolanin):
I have run into this issue over the last couple days. Also using Solr, but
the error is triggered by content that has multi-byte characters (such as
German).

It seems that somewhere lucene is counting bytes instead of characters, so each
substring the highlighter tries to select is offset further forward in the
string being matched.

So the attached patch would probably avoid the exception (and is a good idea)
but would not fix the bug I'm seeing.

Highlighter throws StringIndexOutOfBoundsException
--

Key: LUCENE-1500
URL: https://issues.apache.org/jira/browse/LUCENE-1500
Project: Lucene - Java
Issue Type: Bug
Components: contrib/highlighter
Affects Versions: 2.4
Environment: Found this running the example code in Solr (latest
version).
Reporter: David Bowen
Attachments: patch.txt

Using the canonical Solr example (ant run-example) I added this document
(using exampledocs/post.sh):
adddoc
field name=idTest for Highlighting
StringIndexOutOfBoundsExcdption/field
field name=nameSome Name/field
field name=manuAcme, Inc./field
field name=featuresDescription of the features, mentioning various
things/field
field name=featuresFeatures also is multivalued/field
field name=popularity6/field
field name=inStocktrue/field
/doc/add
and then the URL
http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused
the exception.
I have a patch. I don't know if it is completely correct, but it avoids this
exception.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-24 Thread Peter Wolanin (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676421#action_12676421
]

pwolanin edited comment on LUCENE-1500 at 2/24/09 1:37 PM:

I have run into this issue over the last couple days. Also using Solr, but the
error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere lucene is counting bytes instead of characters, so each
substring the highlighter tries to select is offset further forward in the
string being matched.

So the attached patch would probably avoid the exception (and is a good idea)
but would not fix the bug I'm seeing.

was (Author: pwolanin):
I have run into this issue over the last couple days. Also using Solr, but
the error is triggered by content that has multi-byte characters (such as
German).

It seems that somewhere lucene is counting bytes instead of characters, so each
substring the highlighter tries to select is offset further forward in the
string being matched.

Highlighter throws StringIndexOutOfBoundsException
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-24 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676426#action_12676426
 ] 

Peter Wolanin commented on LUCENE-1500:
---

Actually, looking at the Lucene source and the trace:

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
at java.lang.String.substring(String.java:1765)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   ...
{code}

I see now that getBestTextFragments() takes in a token stream - and each token 
in this steam already has start/end positions set.  So, this patch would 
mitigate the exception, but looks liek the real bug is in Solr.

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
 Attachments: patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-24 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676426#action_12676426
 ] 

pwolanin edited comment on LUCENE-1500 at 2/24/09 2:15 PM:


Actually, looking at the Lucene source and the trace:

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
at java.lang.String.substring(String.java:1765)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   ...
{code}

I see now that getBestTextFragments() takes in a token stream - and each token 
in this steam already has start/end positions set.  So, this patch would 
mitigate the exception, but looks like the real bug is in Solr, or perhaps 
elsewhere in Lucene where the token stream is constructed.

  was (Author: pwolanin):
Actually, looking at the Lucene source and the trace:

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
at java.lang.String.substring(String.java:1765)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   ...
{code}

I see now that getBestTextFragments() takes in a token stream - and each token 
in this steam already has start/end positions set.  So, this patch would 
mitigate the exception, but looks liek the real bug is in Solr.
  
 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
 Attachments: patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-24 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676449#action_12676449
 ] 

Peter Wolanin commented on LUCENE-1500:
---

Actually - the initial patch does not avoid the exception I'm seeing, since the 
start of the token is ok, but the end is beyond the string's end.  Here is a 
slightly enhanced version that checks both the start and end of the token.

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
 Attachments: patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-24 Thread Peter Wolanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated LUCENE-1500:
--

Attachment: LUCENE-1500.patch

 Highlighter throws StringIndexOutOfBoundsException
 --

 Key: LUCENE-1500
 URL: https://issues.apache.org/jira/browse/LUCENE-1500
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.4
 Environment: Found this running the example code in Solr (latest 
 version).
Reporter: David Bowen
 Attachments: LUCENE-1500.patch, patch.txt


 Using the canonical Solr example (ant run-example) I added this document 
 (using exampledocs/post.sh):
 adddoc
   field name=idTest for Highlighting 
 StringIndexOutOfBoundsExcdption/field
   field name=nameSome Name/field
   field name=manuAcme, Inc./field
   field name=featuresDescription of the features, mentioning various 
 things/field
   field name=featuresFeatures also is multivalued/field
   field name=popularity6/field
   field name=inStocktrue/field
 /doc/add
 and then the URL 
 http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused 
 the exception.
 I have a patch.  I don't know if it is completely correct, but it avoids this 
 exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

1 2 >

1 - 100 of 112 matches

Mail list logo