[jira] [Closed] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.
[ https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Mittal closed COMPRESS-471. -- Resolution: Workaround Using suggestions as provided on comments. thanks > Zipped files names having non UTF-8 encoding are being replaced with '?' > while previewing file. > --- > > Key: COMPRESS-471 > URL: https://issues.apache.org/jira/browse/COMPRESS-471 > Project: Commons Compress > Issue Type: Bug >Affects Versions: 1.18 >Reporter: Gaurav Mittal >Priority: Major > Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, > correct.JPG > > > | * All the strings which are not supported by UTF-8 are being replaced by > '?' symbol, > In the issue scenario the charset is 'Cp850', Since the common compress > library cannot identify the 'Cp850' charset and it takes the default charset > as 'UTF-8' therefore > we can see the '?' symbol > In our code > ZipFile ret = new ZipFile(path); > Moreover if we send the encoding in the function as defined below, it works > fine > ZipFile ret = new ZipFile(new File(path), "Cp850",false); > But the second scenario where we are forcibly giving the encoding as 'Cp850' > may cause side effects in some cases > -- > Below code does not seem to resolve UTF8 conflicts and could not make file > names into correct form - > > try { > final Map entriesWithoutUTF8Flag = > populateFromCentralDirectory(); > resolveLocalFileHeaderData(entriesWithoutUTF8Flag); > success = true; > } finally { > closed = !success; > if (!success && closeOnError) { > IOUtils.closeQuietly(archive); > } > }| > | | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.
[ https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698487#comment-16698487 ] Gaurav Mittal commented on COMPRESS-471: Thank you for your suggestion. I am closing this case. > Zipped files names having non UTF-8 encoding are being replaced with '?' > while previewing file. > --- > > Key: COMPRESS-471 > URL: https://issues.apache.org/jira/browse/COMPRESS-471 > Project: Commons Compress > Issue Type: Bug >Affects Versions: 1.18 >Reporter: Gaurav Mittal >Priority: Major > Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, > correct.JPG > > > | * All the strings which are not supported by UTF-8 are being replaced by > '?' symbol, > In the issue scenario the charset is 'Cp850', Since the common compress > library cannot identify the 'Cp850' charset and it takes the default charset > as 'UTF-8' therefore > we can see the '?' symbol > In our code > ZipFile ret = new ZipFile(path); > Moreover if we send the encoding in the function as defined below, it works > fine > ZipFile ret = new ZipFile(new File(path), "Cp850",false); > But the second scenario where we are forcibly giving the encoding as 'Cp850' > may cause side effects in some cases > -- > Below code does not seem to resolve UTF8 conflicts and could not make file > names into correct form - > > try { > final Map entriesWithoutUTF8Flag = > populateFromCentralDirectory(); > resolveLocalFileHeaderData(entriesWithoutUTF8Flag); > success = true; > } finally { > closed = !success; > if (!success && closeOnError) { > IOUtils.closeQuietly(archive); > } > }| > | | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (COMPRESS-472) 7Z Compress is very slow, appx. 1MB/second for mp4/mp3 files
Gaurav Mittal created COMPRESS-472: -- Summary: 7Z Compress is very slow, appx. 1MB/second for mp4/mp3 files Key: COMPRESS-472 URL: https://issues.apache.org/jira/browse/COMPRESS-472 Project: Commons Compress Issue Type: Bug Reporter: Gaurav Mittal Hi, I am using common compress library to compress files in .7z format. Compression rate is very slow(even with LMZA2 with compress preset set as 3 while normal is 6) and close to 1MB/second for files which does not get compressed like mp4/mp3 etc. Same files are compressing faster in zip compressor. Could there be way to fast compressing such file into .7z format? Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.
[ https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697766#comment-16697766 ] Gaurav Mittal commented on COMPRESS-471: Hi, {{I found below solution -}} {quote}private boolean isUTF8Encoded(ZipFile zipFile) { boolean foundUTF8 = false; if (zipFile != null) { foundUTF8 = zipFile.getEncoding().equalsIgnoreCase("UTF8"); Enumeration list = zipFile.getEntries(); if (list != null && list.hasMoreElements()) { ZipArchiveEntry entry; if ((entry = list.nextElement()) != null) foundUTF8 = entry.getGeneralPurposeBit().usesUTF8ForNames(); // using GPB } } return foundUTF8; } {quote} if above API returns false then I can use another constructor of zip file with CP850 charset and get the desired file names. Please let me know whether above approach is fine or not. Thanks > Zipped files names having non UTF-8 encoding are being replaced with '?' > while previewing file. > --- > > Key: COMPRESS-471 > URL: https://issues.apache.org/jira/browse/COMPRESS-471 > Project: Commons Compress > Issue Type: Bug >Affects Versions: 1.18 >Reporter: Gaurav Mittal >Priority: Major > Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, > correct.JPG > > > | * All the strings which are not supported by UTF-8 are being replaced by > '?' symbol, > In the issue scenario the charset is 'Cp850', Since the common compress > library cannot identify the 'Cp850' charset and it takes the default charset > as 'UTF-8' therefore > we can see the '?' symbol > In our code > ZipFile ret = new ZipFile(path); > Moreover if we send the encoding in the function as defined below, it works > fine > ZipFile ret = new ZipFile(new File(path), "Cp850",false); > But the second scenario where we are forcibly giving the encoding as 'Cp850' > may cause side effects in some cases > -- > Below code does not seem to resolve UTF8 conflicts and could not make file > names into correct form - > > try { > final Map entriesWithoutUTF8Flag = > populateFromCentralDirectory(); > resolveLocalFileHeaderData(entriesWithoutUTF8Flag); > success = true; > } finally { > closed = !success; > if (!success && closeOnError) { > IOUtils.closeQuietly(archive); > } > }| > | | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.
[ https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694557#comment-16694557 ] Gaurav Mittal commented on COMPRESS-471: I tried with RAW name but then there will be lot of code changes and it will be difficult to manage them in zip preview and unzip process and dialog boxes etc. I would like library to set some flag when there are non-UTF8 characters in ZipFile means - foundUTF8 = zipFile.getEncoding().equalsIgnoreCase("UTF8"); it might return true or false. or there could be some boolean value which tell whether non-utf8 characters are present in zip file or not. foundUTF8 = zipFile.isUTF8Encoding(); //lib API on that basis we would be able to make more robust changes. We know that ZipFile constructor has all the facility to detect non UTF-8 characters in file name but it does not give power to client code to utilize it. private ZipFile(SeekableByteChannel channel, String archiveName, String encoding, boolean useUnicodeExtraFields, boolean closeOnError) throws IOException { this.entries = new LinkedList(); . try { Map entriesWithoutUTF8Flag = this.populateFromCentralDirectory(); .. } . } Could you please make changes in library or suggest some other way(other than raw name). > Zipped files names having non UTF-8 encoding are being replaced with '?' > while previewing file. > --- > > Key: COMPRESS-471 > URL: https://issues.apache.org/jira/browse/COMPRESS-471 > Project: Commons Compress > Issue Type: Bug >Affects Versions: 1.18 >Reporter: Gaurav Mittal >Priority: Major > Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, > correct.JPG > > > | * All the strings which are not supported by UTF-8 are being replaced by > '?' symbol, > In the issue scenario the charset is 'Cp850', Since the common compress > library cannot identify the 'Cp850' charset and it takes the default charset > as 'UTF-8' therefore > we can see the '?' symbol > In our code > ZipFile ret = new ZipFile(path); > Moreover if we send the encoding in the function as defined below, it works > fine > ZipFile ret = new ZipFile(new File(path), "Cp850",false); > But the second scenario where we are forcibly giving the encoding as 'Cp850' > may cause side effects in some cases > -- > Below code does not seem to resolve UTF8 conflicts and could not make file > names into correct form - > > try { > final Map entriesWithoutUTF8Flag = > populateFromCentralDirectory(); > resolveLocalFileHeaderData(entriesWithoutUTF8Flag); > success = true; > } finally { > closed = !success; > if (!success && closeOnError) { > IOUtils.closeQuietly(archive); > } > }| > | | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.
[ https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691230#comment-16691230 ] Gaurav Mittal commented on COMPRESS-471: We are concerned about cases where we do not need to apply CP850 encoding but still we are applying and then files names can contain undesirable characters. query: Is there any way we can know that some particular zip file contains non-UTF8 characters in file name and if yes, we can handle at application level. Currently I do not see any method that can tell me about non-UTF8 characters and hence we are not able to decide when to apply UTF8 and when to apply other character encoding. Is it possible to fix it from library side? Thanks > Zipped files names having non UTF-8 encoding are being replaced with '?' > while previewing file. > --- > > Key: COMPRESS-471 > URL: https://issues.apache.org/jira/browse/COMPRESS-471 > Project: Commons Compress > Issue Type: Bug >Affects Versions: 1.18 >Reporter: Gaurav Mittal >Priority: Major > Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, > correct.JPG > > > | * All the strings which are not supported by UTF-8 are being replaced by > '?' symbol, > In the issue scenario the charset is 'Cp850', Since the common compress > library cannot identify the 'Cp850' charset and it takes the default charset > as 'UTF-8' therefore > we can see the '?' symbol > In our code > ZipFile ret = new ZipFile(path); > Moreover if we send the encoding in the function as defined below, it works > fine > ZipFile ret = new ZipFile(new File(path), "Cp850",false); > But the second scenario where we are forcibly giving the encoding as 'Cp850' > may cause side effects in some cases > -- > Below code does not seem to resolve UTF8 conflicts and could not make file > names into correct form - > > try { > final Map entriesWithoutUTF8Flag = > populateFromCentralDirectory(); > resolveLocalFileHeaderData(entriesWithoutUTF8Flag); > success = true; > } finally { > closed = !success; > if (!success && closeOnError) { > IOUtils.closeQuietly(archive); > } > }| > | | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.
Gaurav Mittal created COMPRESS-471: -- Summary: Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file. Key: COMPRESS-471 URL: https://issues.apache.org/jira/browse/COMPRESS-471 Project: Commons Compress Issue Type: Bug Affects Versions: 1.18 Reporter: Gaurav Mittal Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, correct.JPG | * All the strings which are not supported by UTF-8 are being replaced by '?' symbol, In the issue scenario the charset is 'Cp850', Since the common compress library cannot identify the 'Cp850' charset and it takes the default charset as 'UTF-8' therefore we can see the '?' symbol In our code ZipFile ret = new ZipFile(path); Moreover if we send the encoding in the function as defined below, it works fine ZipFile ret = new ZipFile(new File(path), "Cp850",false); But the second scenario where we are forcibly giving the encoding as 'Cp850' may cause side effects in some cases -- Below code does not seem to resolve UTF8 conflicts and could not make file names into correct form - try { final Map entriesWithoutUTF8Flag = populateFromCentralDirectory(); resolveLocalFileHeaderData(entriesWithoutUTF8Flag); success = true; } finally { closed = !success; if (!success && closeOnError) { IOUtils.closeQuietly(archive); } }| | | -- This message was sent by Atlassian JIRA (v7.6.3#76005)