Hello, I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse basic sentences, and I'm running into a problem.
I'm using the default regex specified in the example solr configuration: [-\w ,/\n\"']{20,200} But I am using a larger fragment size (140) with a slop of 1.0. Given the passage: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque a ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue vitae, molestie quis nunc. When I search for "Nulla" (the first word of the second sentence) and grab the first highlighted snippet, this is what I get: . <em>Nulla</em> a neque a ipsum accumsan iaculis at id lacus As you can see, there's a leading period from the previous sentence and the period from the current sentence is missing. I understand this regex isn't that advanced, but I've tried everything I can think of, regex-wise, to get this to work, and I always end up with this problem. For example, I've tried: \w[^.!?]{0,200}[.!?] Which seems like it should include the ending punctuation, but it doesn't, so I think I'm missing something. Does anybody know a regex that works? -- Caleb Land