[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-10-07 Thread kinow
Github user kinow commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
Ack @PascalSchumacher after @yufcuy 's feedback we can merge it and include 
in 3.x releases, and then think where, and if, we should move text-related code 
:-)

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #194: add isAllBlank,isNotAllBlank method for String "nul...

2016-10-07 Thread kinow
Github user kinow commented on the issue:

https://github.com/apache/commons-lang/pull/194
  
Thanks for your contribution @wangdongxun !

>The method would be very confusing, as the String "null" is not all bank.

Agree with @PascalSchumacher . Adding this could indeed be very confusing. 
Future requests could ask for "'null'" (with single quotes inside), or "NULL", 
etc. Easier done with two calls as @PascalSchumacher suggested IMO.

Unfortunately -1 as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (CSV-198) Cannot parse file by header with custom delimiter on Java 1.6

2016-10-07 Thread Emmanuel Bourg (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556117#comment-15556117
 ] 

Emmanuel Bourg commented on CSV-198:


Tomcat has its own UTF-8 decoder due to bugs in the JDK, I wonder if you've hit 
one of those bugs with your input file.

https://github.com/apache/tomcat85/blob/TOMCAT_8_5_6/java/org/apache/tomcat/util/buf/Utf8Decoder.java


> Cannot parse file by header with custom delimiter on Java 1.6
> -
>
> Key: CSV-198
> URL: https://issues.apache.org/jira/browse/CSV-198
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
> Environment: Java 1.6, Windows
>Reporter: Tadhg Pearson
>
> Reading a CSV file from the parser using the header line results in 
> "IllegalArgumentException: Mappng for  not found" - even when the 
> column exists in the file. In this case, we are using Java 1.6 and the file 
> uses the ^ symbol to delimit columns. This works correctly in Java 7 & Java 8.
> The code required to reproduce the issue is below. You can find the 
> optd_por_public.csv file referenced at 
> https://raw.githubusercontent.com/opentraveldata/opentraveldata/master/opentraveldata/optd_por_public.csv
> It will need to be on the classpath to run the unit test.  Hope that helps, 
> Tadhg
> You should get the following output
> --
> java.lang.IllegalArgumentException: Mapping for location_type not found, 
> expected one of 
> [iata_code,icao_code,faa_code,is_geonames,geoname_id,envelope_id,name,asciiname,latitude,longitude,fclass,fcode,page_rank,date_from,date_until,comment,country_code,cc2,country_name,continent_name,adm1_code,adm1_name_utf,adm1_name_ascii,adm2_code,adm2_name_utf,adm2_name_ascii,adm3_code,adm4_code,population,elevation,gtopo30,timezone,gmt_offset,dst_offset,raw_offset,moddate,city_code_list,city_name_list,city_detail_list,tvl_por_list,state_code,location_type,wiki_link,alt_name_section,wac,wac_name]
>   at org.apache.commons.csv.CSVRecord.get(CSVRecord.java:104)
>   at com.amadeus.ui.CSVRecordTest.test(CSVRecordTest.java:31)
> 
> import static org.junit.Assert.assertNotNull;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.InputStreamReader;
> import java.io.UnsupportedEncodingException;
> import org.apache.commons.csv.CSVFormat;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVRecord;
> import org.junit.Test;
> public class CSVRecordTest {
>   private static final CSVFormat CSV_FORMAT = 
> CSVFormat.EXCEL.withDelimiter('^').withFirstRecordAsHeader();
>   
>   @Test
>   public void test() throws UnsupportedEncodingException, IOException {
>   InputStream pointsOfReference = 
> getClass().getResourceAsStream("/optd_por_public.csv");
>   CSVParser parser = CSV_FORMAT.parse(new 
> InputStreamReader(pointsOfReference, "UTF-8"));
>   for (CSVRecord record : parser) {
>   String locationType = record.get("location_type");
>   assertNotNull(locationType);
>   }
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CONFIGURATION-640) Colon in properties file value no longer unescaped in commons configuration 2

2016-10-07 Thread Oliver Heger (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONFIGURATION-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oliver Heger resolved CONFIGURATION-640.

   Resolution: Fixed
Fix Version/s: 2.2

A fix has been applied in revision 1763821.

In Configuration 1.x, unescaping was done for all characters except for the 
list delimiter character. This was changed for Configuration 2.x because here 
the list delimiter character is not known to the configuration, but is managed 
by the ListDelimiterHandler. As a side-effect, the unescaping got lost.

The fix now adds a special handling for a number of characters that are listed 
in the Javadocs of the _Properties.store()_ method to be always escaped:
http://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#store-java.io.Writer-java.lang.String-

> Colon in properties file value no longer unescaped in commons configuration 2
> -
>
> Key: CONFIGURATION-640
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-640
> Project: Commons Configuration
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Tom Byttebier
>Priority: Minor
> Fix For: 2.2
>
>
> A properties file create with java escape a colon in for example a path like 
> C:\test as {noformat}C\:\\test{noformat}
> When reading this property value in commons configuration the colon is 
> unescaped, C:\test
> When reading the property value in commons configuration 2 the colon is no 
> longer unescaped, C\:\test.
> Snipped of the code I used for reading the property
> {code}
> final ConfigurationBuilder builder = 
>new 
> FileBasedConfigurationBuilder<>(PropertiesConfiguration.class).configure(new 
> Parameters().properties().setFile(path.toFile());
> final PropertiesConfiguration propertiesConfiguration = 
> builder.getConfiguration();
> Assert.assertEquals("C:\\test", propertiesConfiguration.getString("test2"));
> {code}
> I've read this 
> [section|http://commons.apache.org/proper/commons-configuration/userguide_v1.10/howto_properties.html#Special_Characters_and_Escaping]
>  so I'm aware of the changes to escaping, but I'm not sure how the escaping 
> of the colon fits into this and if there is a way around this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONFIGURATION-639) OSGi Import-Package declaration not complete regarding optional dependencies

2016-10-07 Thread Oliver Heger (JIRA)

[ 
https://issues.apache.org/jira/browse/CONFIGURATION-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556099#comment-15556099
 ] 

Oliver Heger commented on CONFIGURATION-639:


[~b.eckenfels] Do you already have a working solution you can share?

> OSGi Import-Package declaration not complete regarding optional dependencies
> 
>
> Key: CONFIGURATION-639
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-639
> Project: Commons Configuration
>  Issue Type: Bug
>Affects Versions: 2.1
> Environment: OSGi-container
>Reporter: Rico Neubauer
>  Labels: commons-configuration, osgi
> Attachments: s1.txt, s2.txt
>
>
> common-configuration2's pom.xml correctly defines optional dependencies like 
> vfs2 or spring with .
> However, it does only declare a subset of those as optional OSGi-dependencies:
> {code:xml}
> 
> 
>   org.apache.commons.beanutils.*;resolution:=optional,
>   org.apache.commons.codec.*;resolution:=optional,
>   org.apache.commons.jxpath.*;resolution:=optional,
>   org.apache.xml.resolver.*;resolution:=optional,
>   javax.servlet.*;resolution:=optional,
>   org.apache.commons.jexl2.*;resolution:=optional,
>   org.apache.commons.vfs2.*;resolution:=optional,
>   *
> 
> {code}
> See https://github.com/apache/commons-configuration/blob/trunk/pom.xml for 
> both above.
> Due to the missing "resolution:=optional, commons-configuration2 cannot be 
> deployed in an OSGi-environment not providing the optional bundles.
> Example error on deploy:
> {code}
> Unable to resolve Module[org.apache.commons.configuration:2.1.0]: missing 
> requirement [Module[org.apache.commons.configuration:2.1.0]] package; 
> (package=org.springframework.beans.factory)
> {code}
> Please have a look if you agree and add the missing instructions for the 
> remaining optional dependencies.
> Manually fixed Import-package statement looks like this (disregarding 
> line-breaks):
> {code}
> Import-Package: 
> javax.naming,javax.servlet;resolution:=optional,javax.sql,javax.xml.parsers,javax.xml.transform,javax.xml.transform.dom,javax.xml.transform.stream,org.apache.commons.beanutils;resolution:=optional,org.apache.commons.codec.binary;resolution:=optional,org.apache.commons.jexl2;resolution:=optional,org.apache.commons.jxpath;resolution:=optional,org.apache.commons.jxpath.ri;resolution:=optional,org.apache.commons.jxpath.ri.compiler;resolution:=optional,org.apache.commons.jxpath.ri.model;resolution:=optional,org.apache.commons.lang3;version="[3.3,4)",org.apache.commons.lang3.builder;version="[3.3,4)",org.apache.commons.lang3.concurrent;version="[3.3,4)",org.apache.commons.lang3.mutable;version="[3.3,4)",org.apache.commons.lang3.text;version="[3.3,4)",org.apache.commons.lang3.text.translate;version="[3.3,4)",org.apache.commons.logging;version="[1.2,2)",org.apache.commons.logging.impl;version="[1.2,2)",org.apache.commons.vfs2;resolution:=optional,org.apache.commons.vfs2.provider;resolution:=optional,org.apache.xml.resolver;resolution:=optional,org.apache.xml.resolver.helpers;resolution:=optional,org.apache.xml.resolver.readers;resolution:=optional,org.apache.xml.resolver.tools;resolution:=optional,org.springframework.beans.factory;resolution:=optional,org.springframework.core.env;resolution:=optional,org.springframework.core.io;resolution:=optional,org.springframework.util;resolution:=optional,org.w3c.dom,org.xml.sax,org.xml.sax.helpers
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CSV-196) Store the info of whether a field is enclosed by quotes

2016-10-07 Thread Emmanuel Bourg (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556070#comment-15556070
 ] 

Emmanuel Bourg commented on CSV-196:


But the parser reads characters, not bytes. That doesn't seem very reliable, 
and I suspect there are other cases where the parser ignores input characters 
(such as spaces between a delimiter and a quote). Flagging the quoted fields is 
likely to be insufficient.

> Store the info of whether a field is enclosed by quotes
> ---
>
> Key: CSV-196
> URL: https://issues.apache.org/jira/browse/CSV-196
> Project: Commons CSV
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Matt Sun
>  Labels: easyfix, features, patch
> Fix For: Patch Needed
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It will be good to have CSVParser class to store the info of whether a field 
> was enclosed by quotes in the original source file.
> For example, for this data sample:
> A, B, C
> a1, "b1", c1
> CSVParser gives us record a1, b1, c1, which is helpful because it parsed 
> double quotes, but we also lost the information of original data at the same 
> time. We can't tell from the CSVRecord returned whether the original data is 
> enclosed by double quotes or not.
> In our use case, we are integrating Apache Hadoop APIs with Commons CSV.  CSV 
> is one kind of input of Hadoop Jobs, which should support splitting input 
> data. To accurately split a CSV file into pieces, we need to count the bytes 
> of  data CSVParser actually read. CSVParser doesn't have accurate information 
> of whether a field was enclosed by quotes, neither does it store raw data of 
> the original source. Downstream users of commons CSVParser is not able to get 
> those info.
> To suggest a fix: Extend the token/CSVRecord to have a boolean field 
> indicating whether the column was enclosed by quotes. While Lexer is doing 
> getNextToken, set the flag if a field is encapsulated and successfully parsed.
> I find another issue reported with similar request, but it was marked as 
> resolved: [CSV91] 
> https://issues.apache.org/jira/browse/CSV-91?jql=project%20%3D%20CSV%20AND%20text%20~%20%22with%20quotes%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COMPRESS-337) TAR header parsing attempts OLDGNU format on POSIX/STAR header

2016-10-07 Thread Stefan Bodewig (JIRA)

 [ 
https://issues.apache.org/jira/browse/COMPRESS-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig resolved COMPRESS-337.
-
Resolution: Won't Fix

actually "can't fix". Somebody seems to be creating invalid archives and 
neither we nor GNU tar can read them.

> TAR header parsing attempts OLDGNU format on POSIX/STAR header
> --
>
> Key: COMPRESS-337
> URL: https://issues.apache.org/jira/browse/COMPRESS-337
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.10
>Reporter: Jeremy Gustie
>
> This is at least tangentially related to COMPRESS-336, we found an archive 
> which is misinterpreted as an OLDGNU header format when it appears to some 
> type of STAR header (unlike COMPRESS-336 this one uses a NULL delimiter 
> instead of space terminators for the time stamps). Apparently the archive 
> itself comes from an [EyeFi 
> card|https://github.com/golang/go/issues/5290](?!).
> Both the original archive ([available 
> here|https://storage.googleapis.com/go-attachment/5290/0/in.tar]) and the 
> (presumably hand made) version for testing 
> ([here|https://github.com/golang/go/raw/master/src/archive/tar/testdata/nil-uid.tar])
>  exhibit the same behavior: they fail with an {{IllegalArgumentException}} 
> attempting to parse the old GNU {{realSize}} from a buffer partially filled 
> with the {{atime}}/{{ctime}} values.
> At first I thought if the {{TarArchiveEntry.evaluateType}} were to also look 
> at the {{version}} field for {{VERSION_GNU_SPACE}} before returning 
> {{FORMAT_OLDGNU}} it _should_ be enough (according to something I saw in [the 
> GNU documentation|http://www.gnu.org/software/tar/manual/tar.html#SEC184] 
> which defines {{OLDGNU_MAGIC "ustar  "  /* 7 chars and a null */}}). 
> Unfortunately it looks like that is the magic/version used by the archives in 
> question, leaving me stumped as to how to identify which of the three 
> prevailing header formats needs to be parsed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMPRESS-294) .Z decompress “Invalid 9 bit code 0x183”

2016-10-07 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1538#comment-1538
 ] 

Stefan Bodewig commented on COMPRESS-294:
-

[~theqmaster] any updates?

> .Z decompress “Invalid 9 bit code 0x183”
> 
>
> Key: COMPRESS-294
> URL: https://issues.apache.org/jira/browse/COMPRESS-294
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.9
>Reporter: Q
> Attachments: commons-compress-1.10-SNAPSHOT.jar
>
>
> Trying to decompress a .Z file I get “Invalid 9 bit code 0x183”
> It seems that the z file was created under unix using the default bits value 
> (16 bits). The current implementation seems to support only 9 bits.
> (I can't provide a sample file since contains client data but I will try to 
> get on dummy file from the client)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CSV-196) Store the info of whether a field is enclosed by quotes

2016-10-07 Thread Matt Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1534#comment-1534
 ] 

Matt Sun commented on CSV-196:
--

yes, pretty much. The bytes read from the source file, including encapsulators, 
spaces, line breaks etc.

> Store the info of whether a field is enclosed by quotes
> ---
>
> Key: CSV-196
> URL: https://issues.apache.org/jira/browse/CSV-196
> Project: Commons CSV
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Matt Sun
>  Labels: easyfix, features, patch
> Fix For: Patch Needed
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It will be good to have CSVParser class to store the info of whether a field 
> was enclosed by quotes in the original source file.
> For example, for this data sample:
> A, B, C
> a1, "b1", c1
> CSVParser gives us record a1, b1, c1, which is helpful because it parsed 
> double quotes, but we also lost the information of original data at the same 
> time. We can't tell from the CSVRecord returned whether the original data is 
> enclosed by double quotes or not.
> In our use case, we are integrating Apache Hadoop APIs with Commons CSV.  CSV 
> is one kind of input of Hadoop Jobs, which should support splitting input 
> data. To accurately split a CSV file into pieces, we need to count the bytes 
> of  data CSVParser actually read. CSVParser doesn't have accurate information 
> of whether a field was enclosed by quotes, neither does it store raw data of 
> the original source. Downstream users of commons CSVParser is not able to get 
> those info.
> To suggest a fix: Extend the token/CSVRecord to have a boolean field 
> indicating whether the column was enclosed by quotes. While Lexer is doing 
> getNextToken, set the flag if a field is encapsulated and successfully parsed.
> I find another issue reported with similar request, but it was marked as 
> resolved: [CSV91] 
> https://issues.apache.org/jira/browse/CSV-91?jql=project%20%3D%20CSV%20AND%20text%20~%20%22with%20quotes%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMPRESS-358) Offset is larger than block size in IWA dialect of FramedSnappyCompressorInputStream

2016-10-07 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1531#comment-1531
 ] 

Stefan Bodewig commented on COMPRESS-358:
-

[~talli...@mitre.org] any update on the files?

> Offset is larger than block size in IWA dialect of 
> FramedSnappyCompressorInputStream
> 
>
> Key: COMPRESS-358
> URL: https://issues.apache.org/jira/browse/COMPRESS-358
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.12
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: DocumentStylesheet.iwa, 
> DocumentStylesheet_uncompressed.iwa
>
>
> I finally was able to run the FramedSnappyCompressorInputStreamon a larger 
> number of .iwa files.  I got the following on two files:
> {noformat}
> java.io.IOException: Offset is larger than block size
>   at 
> org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.expandCopy(SnappyCompressorInputStream.java:341)
>   at 
> org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.fill(SnappyCompressorInputStream.java:212)
>   at 
> org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.read(SnappyCompressorInputStream.java:134)
>   at 
> org.apache.commons.compress.compressors.snappy.FramedSnappyCompressorInputStream.readOnce(FramedSnappyCompressorInputStream.java:166)
>   at 
> org.apache.commons.compress.compressors.snappy.FramedSnappyCompressorInputStream.read(FramedSnappyCompressorInputStream.java:122)
>   at java.io.InputStream.read(InputStream.java:101)
>   at java.nio.file.Files.copy(Files.java:2908)
>   at java.nio.file.Files.copy(Files.java:3027)
> {noformat}
> No good deed goes unpunished... :)  Thank you, again!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COMPRESS-364) ZipArchiveInputStream.closeEntry does not properly advance to next entry if there are junk bytes at end of data section

2016-10-07 Thread Stefan Bodewig (JIRA)

 [ 
https://issues.apache.org/jira/browse/COMPRESS-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig resolved COMPRESS-364.
-
   Resolution: Fixed
Fix Version/s: 1.13

Thanks a lot, Mike, and sorry for the delay.

I've applied your patch, it will be part of the next release of compress.

> ZipArchiveInputStream.closeEntry does not properly advance to next entry if 
> there are junk bytes at end of data section
> ---
>
> Key: COMPRESS-364
> URL: https://issues.apache.org/jira/browse/COMPRESS-364
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.12
>Reporter: Michael Mole
>Priority: Minor
> Fix For: 1.13
>
> Attachments: 
> 0001-COMPRESS-364-ZipArchiveInputStream.closeEntry-fails-.patch
>
>
> ZipArchiveInputStream.closeEntry() will not properly advance to the next 
> entry causing the next call to getNextZipEntry to incorrectly return null if 
> there are junk bytes at the end of the compressed data section.
> More specifically, I found a case where the first entry's local header says 
> that its compressed data size is 620 bytes. There are in fact 620 bytes 
> before the next local header. However, when the compressed data is inflated, 
> it only requires 618 of the 620 bytes to fully inflate (i.e. before it 
> encounters the DEFLATE end of data code). This means that there is complete 
> DEFLATE compressed data + extra garbage bytes after it, all within the 
> specified zip entry data section.
> The commons compress ZipArchiveInputStream streaming implementation doesn't 
> exactly read on zip entry boundaries, but instead it reads 512 bytes at a 
> time. As a result it tends to read more bytes than necessary per entry and 
> then seek back to the beginning of the next entry. When it seeks back, it 
> assumes that number of bytes that were required to be read to reach the end 
> of the zip entry is the same as the number of bytes needed to inflate the 
> data. However that assumption does not hold up in this case. 620 bytes need 
> to be read to reach then end of the zip entry, but only 618 were needed to 
> inflate the data.  After the pushback, the closeEntry() method should perform 
> a final drain of the remaining bytes to reach the next local file header.
> I've created a test case and fix.  I will submit a pull request shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (COMPRESS-366) TarArchiveEntry: getDirectoryEntries not working

2016-10-07 Thread Stefan Bodewig (JIRA)

 [ 
https://issues.apache.org/jira/browse/COMPRESS-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig resolved COMPRESS-366.
-
Resolution: Fixed

fixed with git commit f010260

> TarArchiveEntry: getDirectoryEntries not working
> 
>
> Key: COMPRESS-366
> URL: https://issues.apache.org/jira/browse/COMPRESS-366
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.12
> Environment: Eclipse 4.6, Linux 64 Bit
>Reporter: Casi Colada
>Priority: Minor
> Fix For: 1.13
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> TarArchiveEntry.getDirectoryEntries() always returns an empty array. This is 
> because entry.getFile() returns null for a directory entry.
> Let folder.tar be a Tar Archive which contains a folder, and that folder 
> contains a file. Consider the following snippet:
> 
> import java.io.FileInputStream;
> import org.apache.commons.compress.archivers.tar.*;
> public class GetDirectoryEntriesBug {
>   public static void main(String[] args) throws Exception {
>   TarArchiveInputStream tais = new TarArchiveInputStream(new 
> FileInputStream("folder.tar"));
>   for(TarArchiveEntry entry; (entry = tais.getNextTarEntry()) != 
> null; ) {
>   System.out.println("Name: " + entry.getName() + ", 
> isDirectory: " + entry.isDirectory() + ", getDirectoryEntries().length: " + 
> entry.getDirectoryEntries().length);
>   }
>   tais.close();
>   }
> }
> 
> Output:
> Name: folder/file, isDirectory: false, getDirectoryEntries().length: 0
> Name: folder/, isDirectory: true, getDirectoryEntries().length: 0
> I expected that, for "folder/", getDirectoryEntries() will not return an 
> empty array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] commons-lang issue #194: add isAllBlank,isNotAllBlank method for String "nul...

2016-10-07 Thread PascalSchumacher
Github user PascalSchumacher commented on the issue:

https://github.com/apache/commons-lang/pull/194
  
Thanks for the pull request.

The method body can be shortened to `StringUtils.isBlank(cs) || 
StringUtils.equalsIgnoreCase(cs, "null")`.

The method would be very confusing, as the String `"null"` is not all bank.

I'm sorry, but I'm -1 to adding this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-10-07 Thread PascalSchumacher
Github user PascalSchumacher commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
@kinow I think this has to stay in lang. At least until lang 4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (IO-499) FilenameUtils.directoryContains(String, String) gives false positive when two directories exist with equal prefixes

2016-10-07 Thread Federico Bonelli (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1171#comment-1171
 ] 

Federico Bonelli commented on IO-499:
-

@cagdasyelen I'm afraid this patch doesn't consider the case when the file path 
is composed using '\' separators (aka Windows case). We should adapt it to that 
case before pulling the patch.

> FilenameUtils.directoryContains(String, String) gives false positive when two 
> directories exist with equal prefixes
> ---
>
> Key: IO-499
> URL: https://issues.apache.org/jira/browse/IO-499
> Project: Commons IO
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Federico Bonelli
>Priority: Minor
>
> In a folder layout as such:
> {code}
> /foo/a.txt
> /foo2/b.txt
> {code}
> The result of invoking directoryContains is wrong:
> {code}
> FilenameUtils.directoryContains("/foo", "/foo2/b.txt"); // returns true
> {code}
> even if "/foo" and "/foo2/b.txt" are the canonical paths, they start with the 
> same characters, and the current implementation of the method fails.
> As workaround we are currently appending a path separator '/' to the first 
> argument.
> It is noteworthy that the current implementation of 
> FileUtils.directoryContains() reveals this issue because it uses the 
> File.getCanonicalPath() to obtain the String paths of "/foo" and 
> "/foo2/b.txt".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CSV-198) Cannot parse file by header with custom delimiter on Java 1.6

2016-10-07 Thread Tadhg Pearson (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1014#comment-1014
 ] 

Tadhg Pearson commented on CSV-198:
---

I suspected the same thing, but was then unable to find any evidence to verify 
that this was the case... what leads you to this suspicion?

> Cannot parse file by header with custom delimiter on Java 1.6
> -
>
> Key: CSV-198
> URL: https://issues.apache.org/jira/browse/CSV-198
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
> Environment: Java 1.6, Windows
>Reporter: Tadhg Pearson
>
> Reading a CSV file from the parser using the header line results in 
> "IllegalArgumentException: Mappng for  not found" - even when the 
> column exists in the file. In this case, we are using Java 1.6 and the file 
> uses the ^ symbol to delimit columns. This works correctly in Java 7 & Java 8.
> The code required to reproduce the issue is below. You can find the 
> optd_por_public.csv file referenced at 
> https://raw.githubusercontent.com/opentraveldata/opentraveldata/master/opentraveldata/optd_por_public.csv
> It will need to be on the classpath to run the unit test.  Hope that helps, 
> Tadhg
> You should get the following output
> --
> java.lang.IllegalArgumentException: Mapping for location_type not found, 
> expected one of 
> [iata_code,icao_code,faa_code,is_geonames,geoname_id,envelope_id,name,asciiname,latitude,longitude,fclass,fcode,page_rank,date_from,date_until,comment,country_code,cc2,country_name,continent_name,adm1_code,adm1_name_utf,adm1_name_ascii,adm2_code,adm2_name_utf,adm2_name_ascii,adm3_code,adm4_code,population,elevation,gtopo30,timezone,gmt_offset,dst_offset,raw_offset,moddate,city_code_list,city_name_list,city_detail_list,tvl_por_list,state_code,location_type,wiki_link,alt_name_section,wac,wac_name]
>   at org.apache.commons.csv.CSVRecord.get(CSVRecord.java:104)
>   at com.amadeus.ui.CSVRecordTest.test(CSVRecordTest.java:31)
> 
> import static org.junit.Assert.assertNotNull;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.InputStreamReader;
> import java.io.UnsupportedEncodingException;
> import org.apache.commons.csv.CSVFormat;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVRecord;
> import org.junit.Test;
> public class CSVRecordTest {
>   private static final CSVFormat CSV_FORMAT = 
> CSVFormat.EXCEL.withDelimiter('^').withFirstRecordAsHeader();
>   
>   @Test
>   public void test() throws UnsupportedEncodingException, IOException {
>   InputStream pointsOfReference = 
> getClass().getResourceAsStream("/optd_por_public.csv");
>   CSVParser parser = CSV_FORMAT.parse(new 
> InputStreamReader(pointsOfReference, "UTF-8"));
>   for (CSVRecord record : parser) {
>   String locationType = record.get("location_type");
>   assertNotNull(locationType);
>   }
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CSV-199) CSVFormat option to defend against CSV Excel Macro Injection (CEMI) attacks

2016-10-07 Thread Phil Varner (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554909#comment-15554909
 ] 

Phil Varner commented on CSV-199:
-

I'm primarily thinking about the CSVPrinter output use case, rather than the 
input case.  I don't have any evidence, but I would expect that a good 
percentage of the usage of CSVPrinter is for data that will be consumed by an 
end user using a spreadsheet rather than processed by a non-spreadsheet 
application.  

None of those other predefined formats would be necessary, as exporting to 
MySQL format wouldn't be a vector for this as MySQL wouldn't execute the 
formulas. Likewise with default, as that should be the generic case where the 
use of the csv isn't anticipated to be Excel import. 

I'm also not sure as to whether commons-csv should be responsible for this, but 
from my own experience (which lead me to commons-csv), I was fixing an old bug 
a webapp where the CSV was constructed using string concatenation, so values 
weren't escaped at all, so the output CSV could easily be broken by a 
user-generated input starting with '",'.  After converting this to use 
commons-csv, I was surprised that the formulas still weren't escaped, and I had 
to do that manually in my code.  I think it would be nice to both make it 
apparent to users that this is something that can happen, and have a flag to 
set defense against it. 



> CSVFormat option to defend against CSV Excel Macro Injection (CEMI) attacks
> ---
>
> Key: CSV-199
> URL: https://issues.apache.org/jira/browse/CSV-199
> Project: Commons CSV
>  Issue Type: New Feature
>  Components: Printer
>Affects Versions: 1.4
>Reporter: Phil Varner
>
> A common use for Commons CSV is to export user-generated data for analysis in 
> spreadsheet software like Excel.  One attack against this usage is for a user 
> to create data that appears as a formula to Excel, such that excel executes 
> it.  For example, a simple non-malicious example of this is a u CSV file like:
> {code}
> Name,Email,Favorite Color
> Aaron Aaronson,a...@example.com,=1+1
> {code}
> When opened, Excel will execute the macro and display "2".  A malicious 
> example could, for example, use "=cmd|' /C calc'!A0", causing a command 
> prompt to be opened. 
> This can be exploited with values starting with =, +, -, or .
> This feature would add a flag to CSVFormat called "escapeFormulas" that would 
> defend against creating vulnerable CSV files like this by prepending a 
> single-quote to any CSV column value starting with the four aforementioned 
> characters.  Also added would be a predefined format EXCEL_WITHOUT_FORMULAS 
> that could be used for safely exporting data that was not intended to contain 
> formulas. 
> I believe it is important to add this as a feature to CSVFormat rather than 
> relying on users to manually escape formulas because many users do not know 
> about this security vulnerability, but would prefer to defend against it if 
> aware. 
> More information:
> https://www.owasp.org/index.php/CSV_Excel_Macro_Injection
> https://hackerone.com/reports/72785
> http://www.contextis.com/resources/blog/comma-separated-vulnerabilities/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-10-07 Thread kinow
Github user kinow commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
Hi @yufcuy,

Sorry for the delay to look into this.

I looked at the first two implements this morning to refresh my memory. The 
first one creating the whole comparison table, and the second one with just the 
current and previous row. Both described in the Wikipedia page you linked 
(thanks for that).

Then I started reviewing your pull request, and I believe your 
implementation is correct :-) though I couldn't find the exact algorithm 
implementation description on Wikipedia. The best I could find was this page: 
http://blog.softwx.net/2014/12/optimizing-levenshtein-algorithm-in-c.html

The page mentioned above mentions the single-array approach. We could add a 
link to it in the Javadoc, as for the previous two implementations. What do you 
think?

There are other trivial changes regarding spelling, typos, tabs vs. spaces, 
etc. So I will add a few more comments, but all tests are passing, I see no 
regression (feature-wise, or in performance), and if you agree with the minor 
adjustments we may have to do, then I believe we are ready to merge it.

@britter should we keep it in [lang], or in [text]? Either way, I will 
replicate the change in the Levenshtein implementation in [text] :-)

Cheers
Bruno


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (CSV-198) Cannot parse file by header with custom delimiter on Java 1.6

2016-10-07 Thread Emmanuel Bourg (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554432#comment-15554432
 ] 

Emmanuel Bourg commented on CSV-198:


I suspect this could be a UTF-8 decoding bug at the Java level, if so I'm not 
sure we can fix it.

> Cannot parse file by header with custom delimiter on Java 1.6
> -
>
> Key: CSV-198
> URL: https://issues.apache.org/jira/browse/CSV-198
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
> Environment: Java 1.6, Windows
>Reporter: Tadhg Pearson
>
> Reading a CSV file from the parser using the header line results in 
> "IllegalArgumentException: Mappng for  not found" - even when the 
> column exists in the file. In this case, we are using Java 1.6 and the file 
> uses the ^ symbol to delimit columns. This works correctly in Java 7 & Java 8.
> The code required to reproduce the issue is below. You can find the 
> optd_por_public.csv file referenced at 
> https://raw.githubusercontent.com/opentraveldata/opentraveldata/master/opentraveldata/optd_por_public.csv
> It will need to be on the classpath to run the unit test.  Hope that helps, 
> Tadhg
> You should get the following output
> --
> java.lang.IllegalArgumentException: Mapping for location_type not found, 
> expected one of 
> [iata_code,icao_code,faa_code,is_geonames,geoname_id,envelope_id,name,asciiname,latitude,longitude,fclass,fcode,page_rank,date_from,date_until,comment,country_code,cc2,country_name,continent_name,adm1_code,adm1_name_utf,adm1_name_ascii,adm2_code,adm2_name_utf,adm2_name_ascii,adm3_code,adm4_code,population,elevation,gtopo30,timezone,gmt_offset,dst_offset,raw_offset,moddate,city_code_list,city_name_list,city_detail_list,tvl_por_list,state_code,location_type,wiki_link,alt_name_section,wac,wac_name]
>   at org.apache.commons.csv.CSVRecord.get(CSVRecord.java:104)
>   at com.amadeus.ui.CSVRecordTest.test(CSVRecordTest.java:31)
> 
> import static org.junit.Assert.assertNotNull;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.InputStreamReader;
> import java.io.UnsupportedEncodingException;
> import org.apache.commons.csv.CSVFormat;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVRecord;
> import org.junit.Test;
> public class CSVRecordTest {
>   private static final CSVFormat CSV_FORMAT = 
> CSVFormat.EXCEL.withDelimiter('^').withFirstRecordAsHeader();
>   
>   @Test
>   public void test() throws UnsupportedEncodingException, IOException {
>   InputStream pointsOfReference = 
> getClass().getResourceAsStream("/optd_por_public.csv");
>   CSVParser parser = CSV_FORMAT.parse(new 
> InputStreamReader(pointsOfReference, "UTF-8"));
>   for (CSVRecord record : parser) {
>   String locationType = record.get("location_type");
>   assertNotNull(locationType);
>   }
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CSV-196) Store the info of whether a field is enclosed by quotes

2016-10-07 Thread Emmanuel Bourg (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554422#comment-15554422
 ] 

Emmanuel Bourg commented on CSV-196:


Do I understand correctly that you need this info only for counting the number 
of bytes read?

> Store the info of whether a field is enclosed by quotes
> ---
>
> Key: CSV-196
> URL: https://issues.apache.org/jira/browse/CSV-196
> Project: Commons CSV
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Matt Sun
>  Labels: easyfix, features, patch
> Fix For: Patch Needed
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It will be good to have CSVParser class to store the info of whether a field 
> was enclosed by quotes in the original source file.
> For example, for this data sample:
> A, B, C
> a1, "b1", c1
> CSVParser gives us record a1, b1, c1, which is helpful because it parsed 
> double quotes, but we also lost the information of original data at the same 
> time. We can't tell from the CSVRecord returned whether the original data is 
> enclosed by double quotes or not.
> In our use case, we are integrating Apache Hadoop APIs with Commons CSV.  CSV 
> is one kind of input of Hadoop Jobs, which should support splitting input 
> data. To accurately split a CSV file into pieces, we need to count the bytes 
> of  data CSVParser actually read. CSVParser doesn't have accurate information 
> of whether a field was enclosed by quotes, neither does it store raw data of 
> the original source. Downstream users of commons CSVParser is not able to get 
> those info.
> To suggest a fix: Extend the token/CSVRecord to have a boolean field 
> indicating whether the column was enclosed by quotes. While Lexer is doing 
> getNextToken, set the flag if a field is encapsulated and successfully parsed.
> I find another issue reported with similar request, but it was marked as 
> resolved: [CSV91] 
> https://issues.apache.org/jira/browse/CSV-91?jql=project%20%3D%20CSV%20AND%20text%20~%20%22with%20quotes%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CSV-199) CSVFormat option to defend against CSV Excel Macro Injection (CEMI) attacks

2016-10-07 Thread Emmanuel Bourg (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554413#comment-15554413
 ] 

Emmanuel Bourg commented on CSV-199:


Hi Phil, thank you for the suggestion. I'm not convinced commons-csv should 
take the responsibility of cleaning the input. This cleaning seems only 
relevant if the file is then loaded by Excel, but here the file is actually 
processed by a Java application. Also the EXCEL format is just a formatting 
convention, formulas could be added to any format and I don't think we want to 
end with MYSQL_WITHOUT_FORMULAS, DEFAULT_WITHOUT_FORMULAS, etc.

> CSVFormat option to defend against CSV Excel Macro Injection (CEMI) attacks
> ---
>
> Key: CSV-199
> URL: https://issues.apache.org/jira/browse/CSV-199
> Project: Commons CSV
>  Issue Type: New Feature
>  Components: Printer
>Affects Versions: 1.4
>Reporter: Phil Varner
>
> A common use for Commons CSV is to export user-generated data for analysis in 
> spreadsheet software like Excel.  One attack against this usage is for a user 
> to create data that appears as a formula to Excel, such that excel executes 
> it.  For example, a simple non-malicious example of this is a u CSV file like:
> {code}
> Name,Email,Favorite Color
> Aaron Aaronson,a...@example.com,=1+1
> {code}
> When opened, Excel will execute the macro and display "2".  A malicious 
> example could, for example, use "=cmd|' /C calc'!A0", causing a command 
> prompt to be opened. 
> This can be exploited with values starting with =, +, -, or .
> This feature would add a flag to CSVFormat called "escapeFormulas" that would 
> defend against creating vulnerable CSV files like this by prepending a 
> single-quote to any CSV column value starting with the four aforementioned 
> characters.  Also added would be a predefined format EXCEL_WITHOUT_FORMULAS 
> that could be used for safely exporting data that was not intended to contain 
> formulas. 
> I believe it is important to add this as a feature to CSVFormat rather than 
> relying on users to manually escape formulas because many users do not know 
> about this security vulnerability, but would prefer to defend against it if 
> aware. 
> More information:
> https://www.owasp.org/index.php/CSV_Excel_Macro_Injection
> https://hackerone.com/reports/72785
> http://www.contextis.com/resources/blog/comma-separated-vulnerabilities/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)